Exploratory Data Analysis (EDA), or data exploration, is the first step in the data science process. John Tukey coined this term in 1977, when he wrote a book emphasizing the importance of EDA. EDA is required to understand a dataset better, check its features and its shape, validate an initial hypothesis, and get a preliminary idea about the next step that you want to pursue in the following data science tasks.
In this section, you will work on the iris dataset, which was already used in the previous chapter. First, let's load the dataset:
In: import pandas as pd iris_filename = 'datasets-uci-iris.csv' iris = pd.read_csv(iris_filename, header=None, names= ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'target']) iris.head() Out:
Great! Using a few commands, you have already loaded the dataset. Now, the investigation phase starts. Some great insights are provided by the .describe()
method, which can be used as follows:
In: iris.describe() Out:
For all...