An example with a realistic dataset
In this section, we will work with a realistic dataset of moderate size. We will use the World Development Indicators dataset, which is provided free of charge by the World Bank. This is a reasonably sized dataset that is not too large or complex to experiment with.
In any real application, we will need to read data from some source, reformat it to our purposes, and save the reformatted data back to some storage system. pandas offers facilities for data retrieval and storage in multiple formats:
Comma-separated values (CSV) in text files
Excel
JSON
SQL
HTML
Stata
Clipboard data in text format
Python-pickled data
The list of formats supported by pandas keeps growing with each new update to the library. Please refer to http://pandas.pydata.org/pandas-docs/stable/io.html for a current list.
Treating all formats supported by pandas is not possible in a book with the current scope. We will restrict examples to CSV files, which is a simple text format that is widely...