The pandas I/O API is a bundle of reader functions that returns a pandas object. It is very easy to load data using the tools bundled in pandas. Data is loaded into the pandas data structures from records in various types of files, such as
comma-separated values (CSV), Excel, HDF, SQL, JSON, HTML, Google Big Query, pickle, stats format, and the clipboard. There are several reader functions—one function for each type of file—namely read_csv
, read_excel
, read_hdf
, read_sql
, read_json
, read_html
, read_stata
, read_clipboard
, and read_pickle
. After loading, the data is prepared for analyzing. This involves deletion of erroneous entries, normalization, grouping, transformation, and sorting.
The next program demonstrates working on CSV files and performing various operations on it. This program uses Book-Crossing datasets in CSV format, downloaded from http://www2.informatik.uni-freiburg.de/~cziegler/BX/. It contains three CSV files (BX-Books.csv
, BX-Users.csv...