Book Image

Hands-On Data Analysis with NumPy and Pandas

By : Curtis Miller
5 (1)
Book Image

Hands-On Data Analysis with NumPy and Pandas

5 (1)
By: Curtis Miller

Overview of this book

Python, a multi-paradigm programming language, has become the language of choice for data scientists for visualization, data analysis, and machine learning. Hands-On Data Analysis with NumPy and Pandas starts by guiding you in setting up the right environment for data analysis with Python, along with helping you install the correct Python distribution. In addition to this, you will work with the Jupyter notebook and set up a database. Once you have covered Jupyter, you will dig deep into Python’s NumPy package, a powerful extension with advanced mathematical functions. You will then move on to creating NumPy arrays and employing different array methods and functions. You will explore Python’s pandas extension which will help you get to grips with data mining and learn to subset your data. Last but not the least you will grasp how to manage your datasets by sorting and ranking them. By the end of this book, you will have learned to index and group your data for sophisticated data analysis and manipulation.
Table of Contents (12 chapters)

Subsetting your data


Now that we can make pandas series and DataFrames, let's work with the data they contain. In this section, we will see how to get and manipulate the data we store in a pandas series or DataFrame. Naturally, this is an important topic; these objects will be useless otherwise.

You should not be surprised that there are many variations on how to subset DataFrames. We will not cover every idiosyncrasy here; refer to the documentation for an exhaustive discussion. But we will discuss the most important functionality every user of pandas should be aware of.

Subsetting a series

Let's first look at series. Since they are similar to DataFrames, there are key lessons that apply there. The simplest way to subset a series is with square brackets, and we can do so as we would subset a list or NumPy array. The colon operator does work here, but there's more that we can do. We can select elements based on the index of the series, as opposed to just the position of the elements in the...