Book Image

The Python Workshop - Second Edition

By : Corey Wade, Mario Corchero Jiménez, Andrew Bird, Dr. Lau Cher Han, Graham Lee
4.7 (3)
Book Image

The Python Workshop - Second Edition

4.7 (3)
By: Corey Wade, Mario Corchero Jiménez, Andrew Bird, Dr. Lau Cher Han, Graham Lee

Overview of this book

Python is among the most popular programming languages in the world. It’s ideal for beginners because it’s easy to read and write, and for developers, because it’s widely available with a strong support community, extensive documentation, and phenomenal libraries – both built-in and user-contributed. This project-based course has been designed by a team of expert authors to get you up and running with Python. You’ll work though engaging projects that’ll enable you to leverage your newfound Python skills efficiently in technical jobs, personal projects, and job interviews. The book will help you gain an edge in data science, web development, and software development, preparing you to tackle real-world challenges in Python and pursue advanced topics on your own. Throughout the chapters, each component has been explicitly designed to engage and stimulate different parts of the brain so that you can retain and apply what you learn in the practical context with maximum impact. By completing the course from start to finish, you’ll walk away feeling capable of tackling any real-world Python development problem.
Table of Contents (16 chapters)
13
Chapter 13: The Evolution of Python – Discovering New Python Features

Null values

You need to do something about the null values. They will break machine learning algorithms (see Chapter 11, Machine Learning) that rely on numerical values as input. There are several popular choices when dealing with null values:

  • Eliminate the rows. This is a respectable approach if null values are a very small percentage – that is, around 1% of the total dataset.
  • Replace the null value with a significant value, such as the median or the mean. This is a great approach if the rows are valuable, and the column itself is reasonably balanced.
  • Replace the null value with the most likely value, perhaps a 0 or 1. This is preferable to averages when the median or mean might be unrealistic based on other factors.

Note

Mode is the official term for the value that occurs the greatest number of times.

As you can see, which option you choose depends on the data. That’s a general theme that rings true for data science: no one method fits all...