Book Image

Hands-On Data Analysis with NumPy and Pandas

By : Curtis Miller
5 (1)
Book Image

Hands-On Data Analysis with NumPy and Pandas

5 (1)
By: Curtis Miller

Overview of this book

Python, a multi-paradigm programming language, has become the language of choice for data scientists for visualization, data analysis, and machine learning. Hands-On Data Analysis with NumPy and Pandas starts by guiding you in setting up the right environment for data analysis with Python, along with helping you install the correct Python distribution. In addition to this, you will work with the Jupyter notebook and set up a database. Once you have covered Jupyter, you will dig deep into Python’s NumPy package, a powerful extension with advanced mathematical functions. You will then move on to creating NumPy arrays and employing different array methods and functions. You will explore Python’s pandas extension which will help you get to grips with data mining and learn to subset your data. Last but not the least you will grasp how to manage your datasets by sorting and ranking them. By the end of this book, you will have learned to index and group your data for sophisticated data analysis and manipulation.
Table of Contents (12 chapters)

Indexing methods


pandas provides methods that allow us to clearly state how we want to index. We can also distinguish between indexing based on values of the index of the series, and indexing based on the position of objects in the series, as would be the case if we were working with a list. The two methods we'll focus on are loc and iloc. loc focuses on selecting based on the index of the series, and if we try to select key elements that don't exist, we will get an error. iloc indexes as if we were working with a Python list; that is, it indexes based on integer position. So, if we were to try to index with a non-integer in iloc, or try to select an element outside of the range of valid integers, an error will be produced. There is a hybrid method, ix, that acts like loc, but if passed input that cannot be interpreted with respect to the index, it will act like iloc. Because of the ambiguity about how ix will behave, I recommend sticking with loc or iloc most of the time.

Let's return to...