Book Image

Hands-On Data Analysis with NumPy and Pandas

By : Curtis Miller
5 (1)
Book Image

Hands-On Data Analysis with NumPy and Pandas

5 (1)
By: Curtis Miller

Overview of this book

Python, a multi-paradigm programming language, has become the language of choice for data scientists for visualization, data analysis, and machine learning. Hands-On Data Analysis with NumPy and Pandas starts by guiding you in setting up the right environment for data analysis with Python, along with helping you install the correct Python distribution. In addition to this, you will work with the Jupyter notebook and set up a database. Once you have covered Jupyter, you will dig deep into Python’s NumPy package, a powerful extension with advanced mathematical functions. You will then move on to creating NumPy arrays and employing different array methods and functions. You will explore Python’s pandas extension which will help you get to grips with data mining and learn to subset your data. Last but not the least you will grasp how to manage your datasets by sorting and ranking them. By the end of this book, you will have learned to index and group your data for sophisticated data analysis and manipulation.
Table of Contents (12 chapters)

What does pandas do?


pandas introduces two key objects to Python, series and DataFrames, with the latter arguably being the most useful, but pandas DataFrames can be thought of as series bound together. A series is a sequence of data, like a list in basic Python or a 1D NumPy array. And, like the NumPy array, a series has a single data type, but indexing with a series is different. With NumPy there is not much control over row and column indices; but with a series, each element in the series must have a unique index, name, key, however you want to think about it. The index could consist of strings, such as cities in a nation, with the corresponding elements of the series denoting some statistical value, such as the city's population; or dates, such as trading days for a stock series.

A DataFrame can be thought of as multiple series of common length, with a common index, bound together in a single tabular object. This object resembles a NumPy 2D ndarray, but it is not the same thing. Not all...