The main learning outcomes of this chapter are summarized as follows:
Various methods and variations in importing a dataset using pandas:
read_csv
and its variations, reading a dataset using open method in Python, reading a file in chunks using theopen
method, reading directly from a URL, specifying the column names from a list, changing the delimiter of a dataset, and so on.Basic exploratory analysis of data: observing a thumbnail of data, shape, column names, column types, and summary statistics for numerical variables
Handling missing values: The reason for incorporation of missing values, why it is important to treat them properly, how to treat them properly by deletion and imputation, and various methods of imputing data.
Creating dummy variables: creating dummy variables for categorical variables to be used in the predictive models.
Basic plotting: scatter plotting, histograms and boxplots; their meaning and relevance; and how they are plotted.
This chapter is a head start into...