Book Image

Hands-On Data Analysis with NumPy and Pandas

By : Curtis Miller
5 (1)
Book Image

Hands-On Data Analysis with NumPy and Pandas

5 (1)
By: Curtis Miller

Overview of this book

Python, a multi-paradigm programming language, has become the language of choice for data scientists for visualization, data analysis, and machine learning. Hands-On Data Analysis with NumPy and Pandas starts by guiding you in setting up the right environment for data analysis with Python, along with helping you install the correct Python distribution. In addition to this, you will work with the Jupyter notebook and set up a database. Once you have covered Jupyter, you will dig deep into Python’s NumPy package, a powerful extension with advanced mathematical functions. You will then move on to creating NumPy arrays and employing different array methods and functions. You will explore Python’s pandas extension which will help you get to grips with data mining and learn to subset your data. Last but not the least you will grasp how to manage your datasets by sorting and ranking them. By the end of this book, you will have learned to index and group your data for sophisticated data analysis and manipulation.
Table of Contents (12 chapters)

Advanced indexing


Let's now discuss more advanced indexing techniques. We can index ndarray objects using other ndarray. We can slice an ndarray using either ndarray objects containing integers that correspond to the indices of the ndarray we wish to select, or ndarray objects of Boolean values, where the value true means a cell should be included in the slice.

Select the elements of arr2 that are not Wayne, and this is the result:

Wayne is not included in the selection, and this was the array that was generated to do that indexing:

It is True everywhere except where the contents were Wayne.

Another more advanced technique is to select using arrays of integers that identify which elements we want. So here, we're going to create two arrays that will be used for this slicing:

This first 0 in the first array means the first coordinate is zero, and the first 0 in the second array means that second coordinate is zero, as specified by the order these two arrays are listed in. So, in the first row and...