Book Image

Mastering pandas

By : Femi Anthony
Book Image

Mastering pandas

By: Femi Anthony

Overview of this book

<p>Python is a ground breaking language for its simplicity and succinctness, allowing the user to achieve a great deal with a few lines of code, especially compared to other programming languages. The pandas brings these features of Python into the data analysis realm, by providing expressiveness, simplicity, and powerful capabilities for the task of data analysis. By mastering pandas, users will be able to do complex data analysis in a short period of time, as well as illustrate their findings using the rich visualization capabilities of related tools such as IPython and matplotlib.</p> <p>This book is an in-depth guide to the use of pandas for data analysis, for either the seasoned data analysis practitioner or the novice user. It provides a basic introduction to the pandas framework, and takes users through the installation of the library and the IPython interactive environment. Thereafter, you will learn basic as well as advanced features, such as MultiIndexing, modifying data structures, and sampling data, which provide powerful capabilities for data analysis.</p>
Table of Contents (18 chapters)
Mastering pandas
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

What is pandas?


The pandas is a high-performance open source library for data analysis in Python developed by Wes McKinney in 2008. Over the years, it has become the de-facto standard library for data analysis using Python. There's been great adoption of the tool, a large community behind it, (220+ contributors and 9000+ commits by 03/2014), rapid iteration, features, and enhancements continuously made.

Some key features of pandas include the following:

  • It can process a variety of data sets in different formats: time series, tabular heterogeneous, and matrix data.

  • It facilitates loading/importing data from varied sources such as CSV and DB/SQL.

  • It can handle a myriad of operations on data sets: subsetting, slicing, filtering, merging, groupBy, re-ordering, and re-shaping.

  • It can deal with missing data according to rules defined by the user/developer: ignore, convert to 0, and so on.

  • It can be used for parsing and munging (conversion) of data as well as modeling and statistical analysis.

  • It integrates well with other Python libraries such as statsmodels, SciPy, and scikit-learn.

  • It delivers fast performance and can be speeded up even more by making use of Cython (C extensions to Python).

For more information go through the official pandas documentation available at http://pandas.pydata.org/pandas-docs/stable/.