The Pandas Workshop

By : Blaine Bateman, Saikat Basak, Thomas V. Joseph, William So

5 (1)

Buy this Book

The Pandas Workshop

5 (1)

By: Blaine Bateman, Saikat Basak, Thomas V. Joseph, William So

Buy this Book

Overview of this book

The Pandas Workshop will teach you how to be more productive with data and generate real business insights to inform your decision-making. You will be guided through real-world data science problems and shown how to apply key techniques in the context of realistic examples and exercises. Engaging activities will then challenge you to apply your new skills in a way that prepares you for real data science projects. You’ll see how experienced data scientists tackle a wide range of problems using data analysis with pandas. Unlike other Python books, which focus on theory and spend too long on dry, technical explanations, this workshop is designed to quickly get you to write clean code and build your understanding through hands-on practice. As you work through this Python pandas book, you’ll tackle various real-world scenarios, such as using an air quality dataset to understand the pattern of nitrogen dioxide emissions in a city, as well as analyzing transportation data to improve bus transportation services. By the end of this data analytics book, you’ll have the knowledge, skills, and confidence you need to solve your own challenging data science problems with pandas.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Share Your Thoughts

Part 1 – Introduction to pandas

Free Chapter

Chapter 1: Introduction to pandas

Introduction to the world of pandas

Exploring the history and evolution of pandas

Components and applications of pandas

Understanding the basic concepts of pandas

Activity 1.01 – comparing sales data for two stores

Summary

Chapter 2: Working with Data Structures

Introduction to data structures

The need for data structures

Indexes and columns

Series

Activity 2.01 – Working with pandas data structures

Summary

Chapter 3: Data I/O

The world of data

Exploring data sources

Fundamental formats

Additional text formats

Manipulating SQL data

Activity 3.01 – using SQL data for pandas analytics

Summary

Chapter 4: Pandas Data Types

Introducing pandas dtypes

Missing data types

Activity 4.01 – optimizing memory usage by converting into the appropriate dtypes

Subsetting by data types

Summary

Part 2 – Working with Data

Chapter 5: Data Selection – DataFrames

Introduction to DataFrames

Data selection in pandas DataFrames

Activity 5.01 – Creating a multi-index from columns

Bracket and dot notation

Changing DataFrame values using bracket or dot notation

Summary

Chapter 6: Data Selection – Series

Introduction to pandas Series

The Series index

Preparing Series from DataFrames and vice versa

Activity 6.01 – Series data selection

Understanding the differences between base Python and pandas data selection

Activity 6.02 – DataFrame data selection

Summary

Chapter 7: Data Exploration and Transformation

Introduction to data transformation

Dealing with messy data

Dealing with missing data

Summarizing data

Activity 7.01 – data analysis using pivot tables

Summary

Chapter 8: Understanding Data Visualization

Introduction to data visualization

Understanding the basics of pandas visualization

Exploring matplotlib

Visualizing data of different types

Activity 8.01 – Using data visualization for exploratory data analysis

Summary

Part 3 – Data Modeling

Chapter 9: Data Modeling – Preprocessing

An introduction to data modeling

Exploring dependent and independent variables

Understanding data scaling and normalization

Activity 9.01 – Data splitting, scaling, and modeling

Summary

Chapter 10: Data Modeling – Modeling Basics

Introduction to data modeling

Learning the modeling basics

Predicting future values of time series

Activity 10.01 – Normalizing and smoothing data

Summary

Chapter 11: Data Modeling – Regression Modeling

An introduction to regression modeling

Exploring regression modeling

Model diagnostics

Activity 11.01 – Multiple regression with non-linear models

Summary

Part 4 – Additional Use Cases for pandas

Chapter 12: Using Time in pandas

Introduction to time series

What are datetimes?

Activity 12.01 – understanding power usage

Datetime math operations

Summary

Chapter 13: Exploring Time Series

The time series as an index

Resampling, grouping, and aggregation by time

Activity 13.01 – Creating a time series model

Summary

Chapter 14: Applying pandas Data Processing for Case Studies

Introduction to the case studies and datasets

Recap of the preprocessing steps

Activity 14.01 – analyzing air quality data

Summary

Chapter 15: Appendix

Other Books You May Enjoy

Packt is searching for authors like you

Share Your Thoughts

Customer Reviews

5 (1)

5 star

100%

4 star

3 star

2 star

1 star

Activity 7.01 – data analysis using pivot tables

In this activity, you will build pivot tables in order to perform data analysis. We will work on the Student Performance dataset from the GitHub repository.

Note

More details about the Student Performance dataset can be found at https://archive.ics.uci.edu/ml/datasets/Student+Performance.

Your tasks will be to do the following:

Open a Jupyter notebook.
Import the pandas package.
Load the CSV file (using the ; delimiter to separate the columns) as a DataFrame.
Modify the DataFrame to contain only these columns: school, sex, age, address, heath, absences, G1, G2, and G3.
Display the first 10 rows of the DataFrame.
Build a pivot table that is indexed on school.
Build a pivot table that is indexed on school and age.
Build a pivot table that is indexed on school, sex, and age, with the mean and sum aggregation on the absences column.

The expected output is as follows:

...

The Pandas Workshop

By : Blaine Bateman, Saikat Basak, Thomas V. Joseph, William So

The Pandas Workshop

By: Blaine Bateman, Saikat Basak, Thomas V. Joseph, William So

Overview of this book

Related Content you might be interested in

Current Title:

The Pandas Workshop

Learning pandas.

Mastering Exploratory Analysis with pandas

The Supervised Learning Workshop

Activity 7.01 – data analysis using pivot tables