Book Image

The Python Workshop

By : Olivier Pons, Andrew Bird, Dr. Lau Cher Han, Mario Corchero Jiménez, Graham Lee, Corey Wade
Book Image

The Python Workshop

By: Olivier Pons, Andrew Bird, Dr. Lau Cher Han, Mario Corchero Jiménez, Graham Lee, Corey Wade

Overview of this book

Have you always wanted to learn Python, but never quite known how to start? More applications than we realize are being developed using Python because it is easy to learn, read, and write. You can now start learning the language quickly and effectively with the help of this interactive tutorial. The Python Workshop starts by showing you how to correctly apply Python syntax to write simple programs, and how to use appropriate Python structures to store and retrieve data. You'll see how to handle files, deal with errors, and use classes and methods to write concise, reusable, and efficient code. As you advance, you'll understand how to use the standard library, debug code to troubleshoot problems, and write unit tests to validate application behavior. You'll gain insights into using the pandas and NumPy libraries for analyzing data, and the graphical libraries of Matplotlib and Seaborn to create impactful data visualizations. By focusing on entry-level data science, you'll build your practical Python skills in a way that mirrors real-world development. Finally, you'll discover the key steps in building and using simple machine learning algorithms. By the end of this Python book, you'll have the knowledge, skills and confidence to creatively tackle your own ambitious projects with Python.
Table of Contents (13 chapters)

10. Data Analytics with pandas and NumPy

Activity 24: Data Analysis to Find the Outliers in Pay versus the Salary Report in the UK Statistics Dataset

Solution

  1. You begin with a new Jupyter Notebook.
  2. Copy the UK Statistics dataset file into a specific folder where you will be performing this activity.
  3. Import the necessary data visualization packages, which include pandas as pds, matplotlib as plt, and seaborn as sns:
    import pandas as pd
    import matplotlib.pyplot as plt
    %matplotlib inline
    import seaborn as sns
    # Set up seaborn dark grid
    sns.set()
  4. Choose a variable to store DataFrame and place the UKStatistics.csv file within the folder of your Jupyter Notebook. In this case, it would be as follows:
    statistics_df = pd.read_csv('UKStatistics.csv')
  5. Now, to display the dataset, we will be calling the statistics_df variable, and .head() will show us the output of the entire dataset:
    statistics_df.head()

    The output will be as follows:

    Figure 10.54: Dataset output...