Sign In Start Free Trial

Book Overview & Buying
Table Of Contents

Mastering pandas - Second Edition

By : Ashish Kumar

Mastering pandas

By: Ashish Kumar

Overview of this book

pandas is a popular Python library used by data scientists and analysts worldwide to manipulate and analyze their data. This book presents useful data manipulation techniques in pandas to perform complex data analysis in various domains. An update to our highly successful previous edition with new features, examples, updated code, and more, this book is an in-depth guide to get the most out of pandas for data analysis. Designed for both intermediate users as well as seasoned practitioners, you will learn advanced data manipulation techniques, such as multi-indexing, modifying data structures, and sampling your data, which allow for powerful analysis and help you gain accurate insights from it. With the help of this book, you will apply pandas to different domains, such as Bayesian statistics, predictive analytics, and time series analysis using an example-based approach. And not just that; you will also learn how to prepare powerful, interactive business reports in pandas using the Jupyter notebook. By the end of this book, you will learn how to perform efficient data analysis using pandas on complex data, and become an expert data analyst or data scientist in the process.

Preface

Preface

Who this book is for

What this book covers

To get the most out of this book

Get in touch

Free Chapter

Section 1: Overview of Data Analysis and pandas

Section 1: Overview of Data Analysis and pandas

Introduction to pandas and Data Analysis

Introduction to pandas and Data Analysis

Motivation for data analysis

Data analytics pipeline

How Python and pandas fit into the data analytics pipeline

What is pandas?

Where does pandas fit in the pipeline?

Benefits of using pandas

History of pandas

Usage pattern and adoption of pandas

pandas on the technology adoption curve

Popular applications of pandas

Summary

References

Installation of pandas and Supporting Software

Installation of pandas and Supporting Software

Selecting a version of Python to use

Standalone Python installation

Installation of Python and pandas using Anaconda

Dependency packages for pandas

Review of items installed with Anaconda

Cross tooling – combining pandas awesomeness with R, Julia, H20.ai, and Azure ML Studio

Command line tricks for pandas

Options and settings for pandas

Summary

Further reading

Section 2: Data Structures and I/O in pandas

Section 2: Data Structures and I/O in pandas

Using NumPy and Data Structures with pandas

Using NumPy and Data Structures with pandas

NumPy ndarrays

Implementing neural networks with NumPy

Practical applications of multidimensional arrays

Data structures in pandas

Summary

References

I/Os of Different Data Formats with pandas

I/Os of Different Data Formats with pandas

Data sources and pandas methods

CSV and TXT

Excel

URL and S3

HTML

JSON

Reading HDF formats

Reading feather files

Reading parquet files

Reading a SQL file

Reading a SAS/Stata file

Reading from Google BigQuery

Reading from a clipboard

Managing sparse data

Writing JSON objects to a file

Serialization/deserialization

Writing to exotic file types

GeoPandas

Open source APIs – Quandl

Pandas plotting

pandas-datareader

Summary

Section 3: Mastering Different Data Operations in pandas

Section 3: Mastering Different Data Operations in pandas

Indexing and Selecting in pandas

Indexing and Selecting in pandas

Basic indexing

Labels, integer, and mixed indexing

Multi-indexing

Boolean indexing

Operations on indexes

Summary

Grouping, Merging, and Reshaping Data in pandas

Grouping, Merging, and Reshaping Data in pandas

Grouping data

Merging and joining

Pivots and reshaping data

Other methods for reshaping DataFrames

Summary

Special Data Operations in pandas

Special Data Operations in pandas

Writing and applying one-liner custom functions

Handling missing values

A survey of methods on series

pandas string methods

Binary operations on DataFrames and series

Binning values

Using mathematical methods on DataFrames

Summary

Time Series and Plotting Using Matplotlib

Time Series and Plotting Using Matplotlib

Handling time series data

A summary of time series-related objects

Plotting using matplotlib

Summary

Section 4: Going a Step Beyond with pandas

Section 4: Going a Step Beyond with pandas

Making Powerful Reports In Jupyter Using pandas

Making Powerful Reports In Jupyter Using pandas

pandas styling

Navigating Jupyter Notebook

Summary

A Tour of Statistics with pandas and NumPy

A Tour of Statistics with pandas and NumPy

Descriptive statistics versus inferential statistics

Measures of central tendency and variability

Hypothesis testing – the null and alternative hypotheses

Summary

A Brief Tour of Bayesian Statistics and Maximum Likelihood Estimates

A Brief Tour of Bayesian Statistics and Maximum Likelihood Estimates

Introduction to Bayesian statistics

The mathematical framework for Bayesian statistics

Probability distributions

Bayesian statistics versus frequentist statistics

Conducting Bayesian statistical analysis

Monte Carlo estimation of the likelihood function and PyMC

References

Summary

Data Case Studies Using pandas

Data Case Studies Using pandas

End-to-end exploratory data analysis

Web scraping with Python

Data validation

Summary

The pandas Library Architecture

The pandas Library Architecture

Understanding the pandas file hierarchy

Improving performance using Python extensions

Summary

pandas Compared with Other Tools

pandas Compared with Other Tools

Comparison with R

Slicing and selection

Comparison with SQL

Comparison with SAS

Summary

A Brief Tour of Machine Learning

A Brief Tour of Machine Learning

The role of pandas in machine learning

Installation of scikit-learn

Introduction to machine learning

Application of machine learning – Kaggle Titanic competition

Data analysis and preprocessing using pandas

A naive approach to the Titanic problem

The scikit-learn ML/classifier interface

Supervised learning algorithms

Unsupervised learning algorithms

Summary

Other Books You May Enjoy

Other Books You May Enjoy

Leave a review - let other readers know what you think

Boolean indexing

We use Boolean indexing to filter or select parts of the data. The operators are as follows:

Operators	Symbol
OR	\|
AND	&
NOT	~

These operators must be grouped using parentheses when used together. Using the earlier DataFrame from the previous section, here we display the trading dates for which NASDAQ closed above 4,300:

  In [311]: sharesIndexDataDF.ix[(sharesIndexDataDF['PriceType']=='close') & \
                         (sharesIndexDataDF['Nasdaq']>4300) ]
  Out[311]:        PriceType  Nasdaq   S&P 500   Russell 2000
      TradingDate
      2014/02/27   close  4318.93   1854.29   1187.94
      2014/02/28   close  4308.12   1859.45   1183.03
      2 rows × 4 columns

You can also create Boolean conditions in which you use arrays to filter out parts of the data, as shown in the following...

CONTINUE READING

83

Tech Concepts

36

Programming languages

73

Tech Tools

Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

50+ new titles added per month and exclusive early access to books as they are being written.

Mastering pandas

Search

Your notes and bookmarks