Book Image

Become a Python Data Analyst

By : Alvaro Fuentes
Book Image

Become a Python Data Analyst

By: Alvaro Fuentes

Overview of this book

Python is one of the most common and popular languages preferred by leading data analysts and statisticians for working with massive datasets and complex data visualizations. Become a Python Data Analyst introduces Python’s most essential tools and libraries necessary to work with the data analysis process, right from preparing data to performing simple statistical analyses and creating meaningful data visualizations. In this book, we will cover Python libraries such as NumPy, pandas, matplotlib, seaborn, SciPy, and scikit-learn, and apply them in practical data analysis and statistics examples. As you make your way through the chapters, you will learn to efficiently use the Jupyter Notebook to operate and manipulate data using NumPy and the pandas library. In the concluding chapters, you will gain experience in building simple predictive models and carrying out statistical computation and analysis using rich Python tools and proven data analysis techniques. By the end of this book, you will have hands-on experience performing data analysis with Python.
Table of Contents (8 chapters)

Introduction to the pandas library

Pandas is a library for Python that provides fast, flexible, and expressive data structures designed to work with relational or tabular data, such as an SQL table, or an Excel spreadsheet. It is a fundamental, high-level building block for doing practical, real-world data analysis with Python. We use the following line of code to import the pandas library:

#The importing convention
import pandas as pd

Pandas is well-suited to the following cases:

  • When you have tabular data with heterogeneously typed columns, such as the data that you can find in an SQL table or in an Excel spreadsheet
  • When you have ordered or unordered time series data
  • When you have data that is in rows and columns, similar to a matrix
  • When you use, in your work, observational or other types of statistical datasets

There are two primary data structures in pandas:

  • The series...