Book Image

Azure Synapse Analytics Cookbook

By : Gaurav Agarwal, Meenakshi Muralidharan
Book Image

Azure Synapse Analytics Cookbook

By: Gaurav Agarwal, Meenakshi Muralidharan

Overview of this book

As data warehouse management becomes increasingly integral to successful organizations, choosing and running the right solution is more important than ever. Microsoft Azure Synapse is an enterprise-grade, cloud-based data warehousing platform, and this book holds the key to using Synapse to its full potential. If you want the skills and confidence to create a robust enterprise analytical platform, this cookbook is a great place to start. You'll learn and execute enterprise-level deployments on medium-to-large data platforms. Using the step-by-step recipes and accompanying theory covered in this book, you'll understand how to integrate various services with Synapse to make it a robust solution for all your data needs. Whether you're new to Azure Synapse or just getting started, you'll find the instructions you need to solve any problem you may face, including using Azure services for data visualization as well as for artificial intelligence (AI) and machine learning (ML) solutions. By the end of this Azure book, you'll have the skills you need to implement an enterprise-grade analytical platform, enabling your organization to explore and manage heterogeneous data workloads and employ various data integration services to solve real-time industry problems.
Table of Contents (11 chapters)

Visualizing data in a Synapse notebook

Let's now look at an interesting aspect of data exploration that will involve plotting some interesting visuals within the Synapse notebook. We all know that it is always easier to understand pictures or graphs compared to a typical dataset in rows and columns, for example, when you are dealing with a very large dataset, which may contain a lot of key insights. To obtain data-driven insights, we try to work on data pointers that will lead us to those insights; to do that, we plot the data in the form of a visual.

This is exactly what we will be doing in this recipe, and you will learn how to do this within the notebook experience.

Getting ready

We will be leveraging the same data frame that we created in the Reading and writing data from ADLS Gen2 using PySpark recipe.

Basic knowledge of matplotlib is required, which will help you to create static and interactive Python visuals.

How to do it…

Let's get back to the same notebook, PySparkNotebook, that we published in the Reading and writing data from ADLS Gen2 using PySpark recipe:

  1. Import matplotlib.pyplot:
    import matplotlib.pyplot as plt

This is the visualization plotting library in Python, as shown in Figure 2.6:

Figure 2.9 – matplotlib import

Figure 2.9 – matplotlib import

  1. Define and load the entire data frame to pandas using the toPandas() function, and define the chart type that we want to plot. In our case, it will be a histogram, which will give us the distribution for the total passenger count:
    mydataframeplot = mydataframe1.toPandas()
    ax = mydataframeplot['passenger_count'].plot(kind='hist', bins= 20, facecolor='orange')
    ax.set_title('Total Passenger distribution')
    ax.set_xlabel('No. of Passengers')
    ax.set_ylabel('Counts')
    chartplt.suptitle('Trend')
    chartplt.show()

Figure 2.10 shows the output:

Figure 2.10 – Plotting a histogram

Figure 2.10 – Plotting a histogram

How it works…

This leverages the power of the Spark pool that you have created to perform data exploration. It makes the process of extracting useful insights from the data extremely fast. The notebook experience within Synapse makes it a one-stop-shop for the developer and the data analyst to collaborate and perform their respective activities.