Book Image

Mastering pandas - Second Edition

By : Ashish Kumar
Book Image

Mastering pandas - Second Edition

By: Ashish Kumar

Overview of this book

pandas is a popular Python library used by data scientists and analysts worldwide to manipulate and analyze their data. This book presents useful data manipulation techniques in pandas to perform complex data analysis in various domains. An update to our highly successful previous edition with new features, examples, updated code, and more, this book is an in-depth guide to get the most out of pandas for data analysis. Designed for both intermediate users as well as seasoned practitioners, you will learn advanced data manipulation techniques, such as multi-indexing, modifying data structures, and sampling your data, which allow for powerful analysis and help you gain accurate insights from it. With the help of this book, you will apply pandas to different domains, such as Bayesian statistics, predictive analytics, and time series analysis using an example-based approach. And not just that; you will also learn how to prepare powerful, interactive business reports in pandas using the Jupyter notebook. By the end of this book, you will learn how to perform efficient data analysis using pandas on complex data, and become an expert data analyst or data scientist in the process.
Table of Contents (21 chapters)
Free Chapter
1
Section 1: Overview of Data Analysis and pandas
4
Section 2: Data Structures and I/O in pandas
7
Section 3: Mastering Different Data Operations in pandas
12
Section 4: Going a Step Beyond with pandas

What this book covers

Chapter 1, Introduction to pandas and Data Analysis, will introduce pandas and explain where it fits in the data analysis pipeline. We will also look into some of the popular applications of pandas and how Python and pandas can be used for data analysis.

Chapter 2, Installation of pandas and Supporting Software, will deal with the installation of Python (if necessary), the pandas library, and all necessary dependencies for the Windows, macOS X, and Linux platforms. We will also look into the command-line tricks and options and settings for pandas as well.

Chapter 3, Using NumPy and Data Structures with pandas, will give a quick tour of the power of NumPy and provide a glimpse of how it makes life easier when working with pandas. We will also be implementing a neural network with NumPy and exploring some of the practical applications of multi-dimensional arrays.

Chapter 4, I/O of Different Data Formats with pandas, will teach you how to read and write commonplace formats, such as comma-separated value (CSV), with all the options, as well as more exotic file formats, such as URL, JSON, and XML. We will also create files in those formats from data objects and create niche plots from within pandas.

Chapter 5, Indexing and Selecting in pandas, will show you how to access and select data from pandas data structures. We will look in detail at basic indexing, label indexing, integer indexing, mixed indexing, and the operation of indexes.

Chapter 6, Grouping, Merging, and Reshaping Data in pandas, will examine the various functions that enable us to rearrange data, by having you utilize such functions on real-world datasets. We will also learn about grouping, merging, and reshaping data.

Chapter 7, Special Data Operations in pandas, will discuss and elaborate on the methods, syntax, and usage of some of the special data operations in pandas.

Chapter 8, Time Series and Plotting Using Matplotlib, will look at how to handle time series and dates. We will also take a tour of some topics that are necessary for you to know about in order to develop your expertise in using pandas.

Chapter 9, Making Powerful Reports Using pandas in Jupyter, will look into the application of a range of styling, as well as the formatting options that pandas has. We will also learn how to create dashboards and reports in the Jupyter Notebook.

Chapter 10, A Tour of Statistics with pandas and NumPy, will delve into how pandas can be used to perform statistical calculations using packages and calculations.

Chapter 11, A Brief Tour of Bayesian Statistics and Maximum Likelihood Estimates, will examine an alternative approach to statistics, which is the Bayesian approach. We will also look into the key statistical distributions and see how we can use various statistical packages to generate and plot distributions in matplotlib.

Chapter 12, Data Case Studies Using pandas, will discuss how we can solve real-life data case studies using pandas. We will look into web scraping with Python and data validation as well.

Chapter 13, The pandas Library Architecture, will discuss the architecture and code structure of the pandas library. This chapter will also briefly demonstrate how you can improve performance using Python extensions.

Chapter 14, pandas Compared with Other Tools, will focus on comparing pandas, with R and other tools such as SQL and SAS. We will also look into slicing and selection as well.

Chapter 15, Brief Tour of Machine Learning, will conclude the book by giving a brief introduction to the scikit-learn library for doing machine learning and show how pandas fits within that framework.