Book Image

The Data Visualization Workshop

By : Mario Döbler, Tim Großmann
Book Image

The Data Visualization Workshop

By: Mario Döbler, Tim Großmann

Overview of this book

Do you want to transform data into captivating images? Do you want to make it easy for your audience to process and understand the patterns, trends, and relationships hidden within your data? The Data Visualization Workshop will guide you through the world of data visualization and help you to unlock simple secrets for transforming data into meaningful visuals with the help of exciting exercises and activities. Starting with an introduction to data visualization, this book shows you how to first prepare raw data for visualization using NumPy and pandas operations. As you progress, you’ll use plotting techniques, such as comparison and distribution, to identify relationships and similarities between datasets. You’ll then work through practical exercises to simplify the process of creating visualizations using Python plotting libraries such as Matplotlib and Seaborn. If you’ve ever wondered how popular companies like Uber and Airbnb use geoplotlib for geographical visualizations, this book has got you covered, helping you analyze and understand the process effectively. Finally, you’ll use the Bokeh library to create dynamic visualizations that can be integrated into any web page. By the end of this workshop, you’ll have learned how to present engaging mission-critical insights by creating impactful visualizations with real-world data.
Table of Contents (9 chapters)
Preface
7
7. Combining What We Have Learned

About the Book

Do you want to transform data into captivating images? Do you want to make it easy for your audience to process and understand the patterns, trends, and relationships hidden within your data?

The Data Visualization Workshop will guide you through the world of data visualization and help you to unlock simple secrets for transforming data into meaningful visuals with the help of exciting exercises and activities.

Starting with an introduction to data visualization, this book shows you how to first prepare raw data for visualization using NumPy and pandas operations. As you progress, you'll use plotting techniques, such as comparison and distribution, to identify relationships and similarities between datasets. You'll then work through practical exercises to simplify the process of creating visualizations using Python plotting libraries such as Matplotlib, and Seaborn. If you've ever wondered how popular companies like Uber and Airbnb use geoplotlib for geographical visualizations, this book has got you covered, helping you analyze and understand the process effectively. Finally, you'll use the Bokeh library to create dynamic visualizations that can be integrated into any web page.

By the end of this workshop, you'll have learned how to present engaging mission-critical insights by creating impactful visualizations with real-world data. 

Audience

The Data Visualization Workshop is for beginners who want to learn data visualization, as well as developers and data scientists who are looking to enrich their practical data science skills. Prior knowledge of data analytics, data science, and visualization is not mandatory. Knowledge of Python basics and high-school-level math will help you grasp the concepts covered in this data visualization book more quickly and effectively.

About the Chapters

Chapter 1, The Importance of Data Visualization and Data Exploration, will introduce you to the basics of statistical analysis, along with basic operations for calculating the mean, median, and variance of different datasets with real-world datasets.

Chapter 2, All You Need to Know about Plots, will explain the design practices for certain plots. You will design attractive, tangible visualizations and learn to identify the best plot type for a given dataset and scenario.

Chapter 3, A Deep Dive into Matplotlib, will teach you the fundamentals of Matplotlib and how to create visualizations using the built-in plots that are provided by the library. You will also practice how to customize your visualization plots and write mathematical expressions using TeX.

Chapter 4, Simplifying Visualizations Using Seaborn, will extend your knowledge of Matplotlib by explaining the advantages of Seaborn in comparison to Matplotlib to show you how to design visually appealing and insightful plots efficiently.

Chapter 5, Plotting Geospatial Data, will teach you how to utilize Geoplotlib to create stunning geographical visualizations, identify the different types of geospatial charts, and create complex visualizations using tile providers and custom layers.

Chapter 6, Making Things Interactive with Bokeh, will introduce Bokeh, which is used to create insightful web-based visualizations that can be extended into beautiful, interactive visualizations that can easily be integrated into your web page.

Chapter 7, Combining What We Have Learned, will apply all the concepts that we will have learned in all the previous chapters, using three new datasets in combination with practical activities for Matplotlib, Seaborn, Geoplotlib, and Bokeh.

Conventions

Code words in text, database table names, folder names, filenames, file extensions, path names, dummy URLs, user input, and Twitter handles are shown as follows:

"Note that by simply passing the axis parameter in the np.mean() call, we can define the dimension our data will be aggregated on. axis=0 is horizontal and axis=1 is vertical."

Words that you see on the screen (for example, in menus or dialog boxes) appear in the same format.

A block of code is set as follows:

# slicing an intersection of 4 elements (2x2) of the first two rows and first two columns
subsection_2x2 = dataset[1:3, 1:3]
np.mean(subsection_2x2)

New terms and important words are shown like this:

"In this book, you will learn how to use Python in combination with various libraries, such as NumPy, pandas, Matplotlib, Seaborn, and geoplotlib, to create impactful data visualizations using real-world data."

Code Presentation

Lines of code that span multiple lines are split using a backslash ( \ ). When the code is executed, Python will ignore the backslash, and treat the code on the next line as a direct continuation of the current line.

For example:

history = model.fit(X, y, epochs=100, batch_size=5, verbose=1, \
                    validation_split=0.2, shuffle=False)

Comments are added into code to help explain specific bits of logic. Single-line comments are denoted using the # symbol, as follows:

# Print the sizes of the dataset
print("Number of Examples in the Dataset = ", X.shape[0])
print("Number of Features for each example = ", X.shape[1])

Multi-line comments are enclosed by triple quotes, as shown below:

"""
Define a seed for the random number generator to ensure the 
result will be reproducible
"""
seed = 1
np.random.seed(seed)
random.set_seed(seed)

Setting up Your Environment

Before we explore the book in detail, we need to set up specific software and tools. In the following section, we shall see how to do that.

Installing Python

The following section will help you to install python in Windows, macOS and Linux systems.

Installing Python on Windows

Installing Python on Windows is done as follows:

  1. Find your desired version of Python on the official installation page at https://www.anaconda.com/distribution/#windows.
  2. Ensure you select Python 3.7 from the download page.
  3. Ensure that you install a version relevant to the architecture of your system (either 32-bit or 64-bit). You can find out this information in the System Properties window of your OS.
  4. After you download the installer, simply double-click on the file and follow the on-screen instructions.

Installing Python on Linux

To install Python on Linux, you have a couple of good options:

  1. Open Command Prompt and verify that p\Python 3 is not already installed by running python3 --version.
  2. To install Python 3, run this:
    sudo apt-get update
    sudo apt-get install python3.7
  3. Alternatively, you can install Python with the Anaconda Linux distribution by downloading the installer from https://www.anaconda.com/distribution/#linux and following the instructions.

Installing Python on macOS

Similar to Linux, you have a couple of methods for installing Python on a Mac. To install Python on macOS X, do the following:

  1. Open the Terminal for Mac by pressing CMD + Spacebar, type terminal in the open search box, and hit Enter.
  2. Install Xcode through the command line by running xcode-select --install.
  3. The easiest way to install Python 3 is using Homebrew, which is installed through the command line by running ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)".
  4. Add Homebrew to your $PATH environment variable. Open your profile in the command line by running sudo nano ~/.profile and inserting export PATH="/usr/local/opt/python/libexec/bin:$PATH" at the bottom.
  5. The final step is to install Python. In the command line, run brew install python.
  6. You can also install Python using the Anaconda installer, available from https://www.anaconda.com/distribution/#macos.

Installing Libraries

pip comes pre-installed with Anaconda. Once Anaconda is installed on your machine, all the required libraries can be installed using pip, for example, pip install numpy. Alternatively, you can install all the required libraries using pip install –r requirements.txt. You can find the requirements.txt file at https://packt.live/3dgg8Hv.

The exercises and activities will be executed in Jupyter Notebooks. Jupyter is a Python library and can be installed in the same way as the other Python libraries – that is, with pip install jupyter, but fortunately, it comes pre-installed with Anaconda. To open a notebook, simply run the command jupyter notebook in the Anaconda Prompt.

Working with JupyterLab and Jupyter Notebook

You’ll be working on different exercises and activities in JupyterLab. These exercises and activities can be downloaded from the associated GitHub repository.

  1. Download the repository from here: https://github.com/PacktWorkshops/The-Data-Visualization-Workshop.
  2. You can either download it using GitHub Desktop or as a zipped folder by clicking on the green Clone or download button.
  3. You can open a Jupyter Notebook using the Anaconda Navigator by clicking the Launch button under the Jupyter Notebook icon.
  4. You can also open a Jupyter Notebook using the Anaconda Prompt. To do this, open the Anaconda Prompt and run the following command:
    jupyter notebook

    Jupyter Notebook will then be launched in your default browser.

  5. Once you have launched Jupyter Notebook, a list of all files and folders will be presented. You can open the Jupyter Notebook file you wish to work with by simply double clicking it.

Importing Python Libraries

Every exercise and activity in this book will make use of various libraries. Importing libraries into Python is very simple, as shown in the following steps:

  1. To import libraries such as NumPy and pandas, run the following code. This will import the whole numpy library into your current file:
    import numpy            # import numpy
  2. In the first cells of the exercises and activities of this book, you will see the following code. Use np instead of numpy in our code to call methods from numpy:
    import numpy as np      # import numpy and assign alias np
  3. Partial imports can be done as shown in the following code:
    from numpy import mean   # only import the mean method of numpy

    This only loads the mean method from the library.

Accessing the Code Files

You can find the complete code files of this book at https://packt.live/31USkof. You can also run many activities and exercises directly in your web browser by using the interactive lab environment at https://packt.live/37CIQ47.

We've tried to support interactive versions of all activities and exercises, but we recommend a local installation as well for instances where this support isn't available.

If you have any issues or questions about installation, please email us at [email protected].