Book Image

The Python Workshop

By : Olivier Pons, Andrew Bird, Dr. Lau Cher Han, Mario Corchero Jiménez, Graham Lee, Corey Wade
Book Image

The Python Workshop

By: Olivier Pons, Andrew Bird, Dr. Lau Cher Han, Mario Corchero Jiménez, Graham Lee, Corey Wade

Overview of this book

Have you always wanted to learn Python, but never quite known how to start? More applications than we realize are being developed using Python because it is easy to learn, read, and write. You can now start learning the language quickly and effectively with the help of this interactive tutorial. The Python Workshop starts by showing you how to correctly apply Python syntax to write simple programs, and how to use appropriate Python structures to store and retrieve data. You'll see how to handle files, deal with errors, and use classes and methods to write concise, reusable, and efficient code. As you advance, you'll understand how to use the standard library, debug code to troubleshoot problems, and write unit tests to validate application behavior. You'll gain insights into using the pandas and NumPy libraries for analyzing data, and the graphical libraries of Matplotlib and Seaborn to create impactful data visualizations. By focusing on entry-level data science, you'll build your practical Python skills in a way that mirrors real-world development. Finally, you'll discover the key steps in building and using simple machine learning algorithms. By the end of this Python book, you'll have the knowledge, skills and confidence to creatively tackle your own ambitious projects with Python.
Table of Contents (13 chapters)

Null Values

You need to do something about the null values. There are several popular choices when dealing with null values:

  1. Eliminate the rows: A great approach if null values are a very small percentage, such as 1% of the total dataset.
  2. Replace with a significant value, such as the median or the mean: A great approach if the rows are valuable, and the column is reasonably balanced.
  3. Replace with the most likely value, perhaps a 0 or 1: It's preferable to option 2 when the median might be useless. The median can often work here.

    Note

    mode is the official term for the value that occurs the greatest number of times.

As you can see, which option you choose depends on the data.

Exercise 140: Null Value Operations on the Dataset

In this exercise, you will perform a null value operation. You can only select the columns that have null values in our dataset:

  1. Open a new Jupyter Notebook and copy the dataset file within a separate folder where you will...