Book Image

The Data Visualization Workshop

By : Mario Döbler, Tim Großmann
Book Image

The Data Visualization Workshop

By: Mario Döbler, Tim Großmann

Overview of this book

Do you want to transform data into captivating images? Do you want to make it easy for your audience to process and understand the patterns, trends, and relationships hidden within your data? The Data Visualization Workshop will guide you through the world of data visualization and help you to unlock simple secrets for transforming data into meaningful visuals with the help of exciting exercises and activities. Starting with an introduction to data visualization, this book shows you how to first prepare raw data for visualization using NumPy and pandas operations. As you progress, you’ll use plotting techniques, such as comparison and distribution, to identify relationships and similarities between datasets. You’ll then work through practical exercises to simplify the process of creating visualizations using Python plotting libraries such as Matplotlib and Seaborn. If you’ve ever wondered how popular companies like Uber and Airbnb use geoplotlib for geographical visualizations, this book has got you covered, helping you analyze and understand the process effectively. Finally, you’ll use the Bokeh library to create dynamic visualizations that can be integrated into any web page. By the end of this workshop, you’ll have learned how to present engaging mission-critical insights by creating impactful visualizations with real-world data.
Table of Contents (9 chapters)
7. Combining What We Have Learned

7. Combining What We Have Learned

Activity 7.01: Implementing Matplotlib and Seaborn on the New York City Database


  1. Create an Activity7.01.ipynb Jupyter Notebook in the Chapter07/Activity7.01 folder to implement this activity. Import all the necessary libraries:
    # Import statements
    import pandas as pd
    import numpy as np
    import seaborn as sns
    import matplotlib
    import matplotlib.pyplot as plt
    import squarify
  2. Use pandas to read both CSV files located in the Datasets folder:
    p_ny = pd.read_csv('../../Datasets/acs2017/pny.csv')
    h_ny = pd.read_csv('../../Datasets/acs2017/hny.csv')
  3. Use the given PUMA (public use microdata area code based on the 2010 census definition, which are areas with populations of 100,000 or more) ranges to further divide the dataset into NYC districts (Bronx, Manhattan, Staten Island, Brooklyn, and Queens):
    # PUMA ranges
    bronx = [3701, 3710]
    manhatten = [3801, 3810]
    staten_island = [3901, 3903]
    brooklyn =...