Book Image

Hands-On Data Preprocessing in Python

By : Roy Jafari
5 (2)
Book Image

Hands-On Data Preprocessing in Python

5 (2)
By: Roy Jafari

Overview of this book

Hands-On Data Preprocessing is a primer on the best data cleaning and preprocessing techniques, written by an expert who’s developed college-level courses on data preprocessing and related subjects. With this book, you’ll be equipped with the optimum data preprocessing techniques from multiple perspectives, ensuring that you get the best possible insights from your data. You'll learn about different technical and analytical aspects of data preprocessing – data collection, data cleaning, data integration, data reduction, and data transformation – and get to grips with implementing them using the open source Python programming environment. The hands-on examples and easy-to-follow chapters will help you gain a comprehensive articulation of data preprocessing, its whys and hows, and identify opportunities where data analytics could lead to more effective decision making. As you progress through the chapters, you’ll also understand the role of data management systems and technologies for effective analytics and how to use APIs to pull data. By the end of this Python data preprocessing book, you'll be able to use Python to read, manipulate, and analyze data; perform data cleaning, integration, reduction, and transformation techniques, and handle outliers or missing values to effectively prepare data for analytic tools.
Table of Contents (24 chapters)
1
Part 1:Technical Needs
6
Part 2: Analytic Goals
11
Part 3: The Preprocessing
18
Part 4: Case Studies

To get the most out of this book

The book assumes basic programming skills such as working with variables, conditionals, and loops, along with beginner-level knowledge of Python. Other than that, you can start your journey from the beginning of the book and start learning.

The Jupyter Notebook is an excellent UI for learning and practicing programming and data analytics. It can be downloaded and installed easily using Anaconda Navigator. Visit this page for installation: https://docs.anaconda.com/anaconda/navigator/install/.

While Anaconda has most of the modules that the book uses already installed, you will need to install a few other modules such as Seaborn and Graphviz. Don't worry; when the time comes, the book will instruct you on how to go about these installations.

If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book's GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

While learning, keep a file of your own code from each chapter. This learning repository can be used in the future for deeper learning and real projects. The Jupyter Notebook is especially great for this purpose as it allows you to take notes along with the code.