Azure Synapse Analytics Cookbook

By : Gaurav Agarwal, Meenakshi Muralidharan

Azure Synapse Analytics Cookbook

By: Gaurav Agarwal, Meenakshi Muralidharan

Overview of this book

As data warehouse management becomes increasingly integral to successful organizations, choosing and running the right solution is more important than ever. Microsoft Azure Synapse is an enterprise-grade, cloud-based data warehousing platform, and this book holds the key to using Synapse to its full potential. If you want the skills and confidence to create a robust enterprise analytical platform, this cookbook is a great place to start. You'll learn and execute enterprise-level deployments on medium-to-large data platforms. Using the step-by-step recipes and accompanying theory covered in this book, you'll understand how to integrate various services with Synapse to make it a robust solution for all your data needs. Whether you're new to Azure Synapse or just getting started, you'll find the instructions you need to solve any problem you may face, including using Azure services for data visualization as well as for artificial intelligence (AI) and machine learning (ML) solutions. By the end of this Azure book, you'll have the skills you need to implement an enterprise-grade analytical platform, enabling your organization to explore and manage heterogeneous data workloads and employ various data integration services to solve real-time industry problems.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Share Your Thoughts

Chapter 1: Choosing the Optimal Method for Loading Data to Synapse

Choosing a data loading option

Achieving parallelism in data loading using PolyBase

Moving and transforming using a data flow

Adding a trigger to a data flow pipeline

Unsupported data loading scenarios

Data loading best practices

Free Chapter

Chapter 2: Creating Robust Data Pipelines and Data Transformation

Reading and writing data from ADLS Gen2 using PySpark

Visualizing data in a Synapse notebook

Chapter 3: Processing Data Optimally across Multiple Nodes

Working with the resource consumption model of Synapse SQL

Optimizing analytics with dedicated SQL pool and working on data distribution

Working with serverless SQL pool

Processing and querying very large datasets

Script for statistics in Synapse SQL

Chapter 4: Engineering Real-Time Analytics with Azure Synapse Link Using Cosmos DB

Integrating an Azure Synapse ETL pipeline with Cosmos DB

Setting up Azure Cosmos DB analytical store

Enabling Azure Synapse Link and connecting Azure Cosmos DB to Azure Synapse

IoT end-to-end solutions and getting real-time insights

Use cases using Synapse Link

Chapter 5: Data Transformation and Processing with Synapse Notebooks

Landing data in ADLS Gen2

Exploring data with ADLS Gen2 to pandas DataFrame in Synapse notebook

Processing data from a PySpark notebook within Synapse

Performing read-write operations to a Parquet file using Spark in Synapse

Analytics with Spark

Chapter 6: Enriching Data Using the Azure ML AutoML Regression Model

Training a model using AutoML in Synapse

Building a regression model from Azure Machine Learning in Synapse Studio

Modeling and scoring using SQL pools

An overview of Spark MLlib and Azure Synapse

Integrating AI and Cognitive Services

Chapter 7: Visualizing and Reporting Petabytes of Data

Combining Power BI and aserverless SQL pool

Working on a composite model

Using materialized views to improve performance

Chapter 8: Data Cataloging and Governance

Configuring your Azure Purview account for Synapse SQL pool

Scanning data using the Purview data catalog

Enumerating resources within Synapse Studio

Chapter 9: MPP Platform Migration to Synapse

Understanding data migration challenges

Configuring Azure Synapse Pathway

Evaluating a data source to be migrated

Generating a data migration assessment

Supported data sources for migration

Why subscribe?

Other Books You May Enjoy

Packt is searching for authors like you

Share Your Thoughts

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Performing read-write operations to a Parquet file using Spark in Synapse

Apache Parquet is a columnar file format that is supported by many big data processing systems and is the most efficient file format for storing data. Most of the Hadoop and big data world uses Parquet to a large extent. The advantage is the efficient data compression support, which enhances the performance of complex data.

Spark supports both reading and writing Parquet files because it reduces the underlying data storage. Since it occupies less storage, it actually reduces I/O operations and consumes less memory.

In this section, we will learn about reading Parquet files and writing to Parquet files. Reading and writing to a Parquet file with PySpark code is straightforward.

Getting ready

We will be using a public dataset for our scenario. This dataset will consist of New York yellow taxi trip data; this includes attributes such as trip distances, itemized fares, rate types, payment types, pick...

Azure Synapse Analytics Cookbook

By : Gaurav Agarwal, Meenakshi Muralidharan

Azure Synapse Analytics Cookbook

By: Gaurav Agarwal, Meenakshi Muralidharan

Overview of this book

Related Content you might be interested in

Current Title:

Azure Synapse Analytics Cookbook

Limitless Analytics with Azure Synapse

Cloud Scale Analytics with Azure Data Services.

Azure Data Engineer Associate Certification Guide

Performing read-write operations to a Parquet file using Spark in Synapse

Getting ready