Book Image

Hands-On Data Analysis with Pandas

By : Stefanie Molin
Book Image

Hands-On Data Analysis with Pandas

By: Stefanie Molin

Overview of this book

Data analysis has become a necessary skill in a variety of domains where knowing how to work with data and extract insights can generate significant value. Hands-On Data Analysis with Pandas will show you how to analyze your data, get started with machine learning, and work effectively with Python libraries often used for data science, such as pandas, NumPy, matplotlib, seaborn, and scikit-learn. Using real-world datasets, you will learn how to use the powerful pandas library to perform data wrangling to reshape, clean, and aggregate your data. Then, you will be able to conduct exploratory data analysis by calculating summary statistics and visualizing the data to find patterns. In the concluding chapters, you will explore some applications of anomaly detection, regression, clustering, and classification using scikit-learn to make predictions based on past data. By the end of this book, you will be equipped with the skills you need to use pandas to ensure the veracity of your data, visualize it for effective decision-making, and reliably reproduce analyses across multiple datasets.
Table of Contents (21 chapters)
Free Chapter
1
Section 1: Getting Started with Pandas
4
Section 2: Using Pandas for Data Analysis
9
Section 3: Applications - Real-World Analyses Using Pandas
12
Section 4: Introduction to Machine Learning with Scikit-Learn
16
Section 5: Additional Resources
18
Solutions

Adding and removing data

Often, we want to add or remove rows and columns from our data. In the previous sections, we frequently selected a subset of the columns, but if columns/rows aren't useful to us, we should just get rid of them. We also frequently selected data based on the value of the magnitude; however, if we had made a new column holding the Boolean values for later selection, we would have only needed to calculate the mask once. Very rarely will we get data where we neither want to add nor remove something.

Before we get started, it's important to understand that while most methods will return a new DataFrame object, some will be in-place and change our data. If we write a function where we pass in a dataframe and change it, it will change our original dataframe as well. Should we find ourselves in a situation where we don't want to change the original...