Book Image

Learning pandas - Second Edition

By : Michael Heydt
Book Image

Learning pandas - Second Edition

By: Michael Heydt

Overview of this book

You will learn how to use pandas to perform data analysis in Python. You will start with an overview of data analysis and iteratively progress from modeling data, to accessing data from remote sources, performing numeric and statistical analysis, through indexing and performing aggregate analysis, and finally to visualizing statistical data and applying pandas to finance. With the knowledge you gain from this book, you will quickly learn pandas and how it can empower you in the exciting world of data manipulation, analysis and science.
Table of Contents (16 chapters)

Reading and writing data in Excel format

Pandas supports reading data in Excel 2003 and newer formats, using the pd.read_excel() function or via the ExcelFile class. Internally, both techniques use either the XLRD or OpenPyXL packages, so you will need to ensure that one of them is installed in your Python environment.

For demonstration, a data/stocks.xlsx file is provided with the sample data. If you open it in Excel, you will see something similar to what is shown in the following screenshot:

The workbook contains two sheets, msft and aapl, which hold the stock data for each respective stock.

The following then reads the data/stocks.xlsx file into a DataFrame:

This has read only content from the first worksheet in the Excel file (the msft worksheet), and has used the contents of the first row as column names. To read the other worksheet, you can pass the name of the worksheet...