Book Image

Hands-On Data Analysis with NumPy and Pandas

By : Curtis Miller
5 (1)
Book Image

Hands-On Data Analysis with NumPy and Pandas

5 (1)
By: Curtis Miller

Overview of this book

Python, a multi-paradigm programming language, has become the language of choice for data scientists for visualization, data analysis, and machine learning. Hands-On Data Analysis with NumPy and Pandas starts by guiding you in setting up the right environment for data analysis with Python, along with helping you install the correct Python distribution. In addition to this, you will work with the Jupyter notebook and set up a database. Once you have covered Jupyter, you will dig deep into Python’s NumPy package, a powerful extension with advanced mathematical functions. You will then move on to creating NumPy arrays and employing different array methods and functions. You will explore Python’s pandas extension which will help you get to grips with data mining and learn to subset your data. Last but not the least you will grasp how to manage your datasets by sorting and ranking them. By the end of this book, you will have learned to index and group your data for sophisticated data analysis and manipulation.
Table of Contents (12 chapters)

NumPy arrays


Let's now talk about NumPy arrays, which are called ndarray. These are not the arrays you may encounter in C or C++. A better analog is matrices in MATLAB or R; that is, they behave like a mathematical object resembling a mathematical vector, matrix, or tensor. While they can store non-mathematical information such as strings, they exist mainly to manage and facilitate operations with data that is numeric in nature. ndarray are assigned a particular data type or dtype upon creation, and all current and future data in the array must be of that dtype. They also have more than one-dimension, referred to as axes.

A one-dimensional ndarray is a line of data; this would be a vector. A two-dimensional ndarray would be a square of data, effectively a matrix. A three-dimensional ndarray would be key book data, like a tensor. Any number of dimensions is permitted, but most ndarray are one or two-dimensional.

dtype are similar to types in the basic Python language, but NumPy dtype resemble...