Learning Pandas

Book Image

Learning Pandas

By : Michael Heydt

Book Image

Learning Pandas

By: Michael Heydt

Overview of this book

Learning pandas

Learning pandas

Credits

About the Author

About the Author

About the Reviewers

About the Reviewers

www.PacktPub.com

www.PacktPub.com

Preface

Free Chapter

A Tour of pandas

A Tour of pandas

pandas and why it is important

pandas and IPython Notebooks

Referencing pandas in the application

Primary pandas objects

Loading data from files and the Web

Simplicity of visualization of pandas data

Installing pandas

Installing pandas

Getting Anaconda

Installing Anaconda

Ensuring pandas is up to date

Running a small pandas sample in IPython

Starting the IPython Notebook server

Installing and running IPython Notebooks

Using Wakari for pandas

NumPy for pandas

NumPy for pandas

Installing and importing NumPy

Benefits and characteristics of NumPy arrays

Creating NumPy arrays and performing basic array operations

Selecting array elements

Logical operations on arrays

Reshaping arrays

Combining arrays

Splitting arrays

Useful numerical methods of NumPy arrays

The pandas Series Object

The pandas Series Object

The Series object

Importing pandas

Creating Series

Size, shape, uniqueness, and counts of values

Peeking at data with heads, tails, and take

Looking up values in Series

Arithmetic operations

The special case of Not-A-Number (NaN)

Boolean selection

Reindexing a Series

Slicing a Series

The pandas DataFrame Object

The pandas DataFrame Object

Creating DataFrame from scratch

Selecting columns of a DataFrame

Selecting rows and values of a DataFrame using the index

Selecting rows of a DataFrame by Boolean selection

Modifying the structure and content of DataFrame

Arithmetic on a DataFrame

Resetting and reindexing

Hierarchical indexing

Summarized data and descriptive statistics

Accessing Data

Setting up the IPython notebook

Reading and writing JSON files

Accessing data on the web and in the cloud

Reading and writing from/to SQL databases

Reading data from remote data services

Tidying Up Your Data

Tidying Up Your Data

What is tidying your data?

Setting up the IPython notebook

Working with missing data

Handling duplicate data

Transforming Data

Combining and Reshaping Data

Combining and Reshaping Data

Setting up the IPython notebook

Concatenating data

Merging and joining data

Stacking and unstacking

Performance benefits of stacked data

Grouping and Aggregating Data

Grouping and Aggregating Data

Setting up the IPython notebook

The split, apply, and combine (SAC) pattern

Discretization and Binning

Time-series Data

Time-series Data

Setting up the IPython notebook

Representation of dates, time, and intervals

Introducing time-series data

Calculating new dates using offsets

Handling holidays using calendars

Normalizing timestamps using time zones

Manipulating time-series data

Visualization

Setting up the IPython notebook

Plotting basics with pandas

Common plots used in statistical analyses

Multiple plots in a single chart

Applications to Finance

Applications to Finance

Setting up the IPython notebook

Obtaining and organizing stock data from Yahoo!

Plotting time-series prices

Performing a moving-average calculation

Volatility calculation

Determining risk relative to expected returns

Index

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Chapter 1. A Tour of pandas

In this chapter, we will take a look at pandas, which is an open source Python-based data analysis library. It provides high-performance and easy-to-use data structures and data analysis tools built with the Python programming language. The pandas library brings many of the good things from R, specifically the DataFrame objects and R packages such as plyr and reshape2, and places them in a single library that you can use in your Python applications.

The development of pandas was begun in 2008 by Wes McKinney when he worked at AQR Capital Management. It was opened sourced in 2009 and is currently supported and actively developed by various organizations and contributors. It was initially designed with finance in mind, specifically with its ability around time series data manipulation, but emphasizes the data manipulation part of the equation leaving statistical, financial, and other types of analyses to other Python libraries.

In this chapter, we will take a brief tour of pandas and some of the associated tools such as IPython notebooks. You will be introduced to a variety of concepts in pandas for data organization and manipulation in an effort to form both a base understanding and a frame of reference for deeper coverage in later sections of this book. By the end of this chapter, you will have a good understanding of the fundamentals of pandas and even be able to perform basic data manipulations. Also, you will be ready to continue with later portions of this book for more detailed understanding.

This chapter will introduce you to:

pandas and why it is important
IPython and IPython Notebooks
Referencing pandas in your application
The Series and DataFrame objects of pandas
How to load data from files and the Web
The simplicity of visualizing pandas data

Note

pandas is always lowercase by convention in pandas documentation, and this will be a convention followed by this book.