Book Image

Learn Azure Synapse Data Explorer

By : Pericles (Peri) Rocha
Book Image

Learn Azure Synapse Data Explorer

By: Pericles (Peri) Rocha

Overview of this book

Large volumes of data are generated daily from applications, websites, IoT devices, and other free-text, semi-structured data sources. Azure Synapse Data Explorer helps you collect, store, and analyze such data, and work with other analytical engines, such as Apache Spark, to develop advanced data science projects and maximize the value you extract from data. This book offers a comprehensive view of Azure Synapse Data Explorer, exploring not only the core scenarios of Data Explorer but also how it integrates within Azure Synapse. From data ingestion to data visualization and advanced analytics, you’ll learn to take an end-to-end approach to maximize the value of unstructured data and drive powerful insights using data science capabilities. With real-world usage scenarios, you’ll discover how to identify key projects where Azure Synapse Data Explorer can help you achieve your business goals. Throughout the chapters, you'll also find out how to manage big data as part of a software as a service (SaaS) platform, as well as tune, secure, and serve data to end users. By the end of this book, you’ll have mastered the big data life cycle and you'll be able to implement advanced analytical scenarios from raw telemetry and log data.
Table of Contents (19 chapters)
1
Part 1 Introduction to Azure Synapse Data Explorer
6
Part 2 Working with Data
12
Part 3 Managing Azure Synapse Data Explorer

When to use Azure Synapse Data Explorer

By now, you should already understand that Azure Synapse Data Explorer is an analytical engine to process queries on unstructured, semi-structured, and structured data, with exceptionally large data volumes, low-latency ingestion, and blazing-fast queries. Data Explorer is not, however, the solution to every data problem. In some cases, you will be better off with a different solution. Let us look at some of the most common analytics scenarios and the most appropriate analytical store in each case, as follows:

  • Scenario: I need a classic data warehouse.

Recommendation: Do not use Azure Synapse Data Explorer. Use dedicated SQL pools in Azure Synapse, which are optimized for user queries in a typical star schema, even at large data volumes.

  • Scenario: My solution requires frequent updates on individual records, and singleton INSERT, UPDATE, and DELETE operations.

Recommendation: Do not use Azure Synapse Data Explorer. In such cases, a transactional, operational database will be a better solution. Consider options such as Azure SQL, SQL Server (on-premises, or in an Azure VM), MySQL, or even Cosmos DB for NoSQL scenarios.

  • Scenario: My solution needs to run on a cloud other than Microsoft Azure, or on-premises.

Recommendation: Do not use Azure Synapse Data Explorer, as it runs exclusively on Azure.

  • Scenario: My data demands constant transformation and long-running extract, transform, load (ETL)/extract, load, transform (ELT) processes.

Recommendation: Do not use Azure Synapse Data Explorer. Even though you have Synapse pipelines in your Synapse workspace, and you can constantly ingest data into Data Explorer pools, the core scenario for Data Explorer is to offer interactive analytics on big data. You are better off running your ETL/ELT pipelines on Azure Synapse pipelines, ADF, Apache Spark, or even Azure Batch.

  • Scenario: I need to train large ML models several times throughout the day.

Recommendation: This may be a good scenario for Azure Synapse Data Explorer. In this case, you can prepare data or train models on Apache Spark for Azure Synapse, but note that you will miss out on the real-time characteristic of data analysis that Data Explorer offers. Ideally, you want to use Data Explorer with data streaming from devices and applications in real time, but this still can be a valuable scenario for Azure Synapse Data Explorer. This may be less valuable when using the standalone service Azure Data Explorer, as it will not benefit from the native, in-product integration with Apache Spark (even though a connector for Spark is available for Azure Data Explorer uses).

  • Scenario: I have a very small amount of data to analyze.

Recommendation: It depends. If your analysis requires a full-text search or JSON documents, you may benefit from the indexing capabilities of Azure Synapse Data Explorer. It can also be a suitable alternative if you need to correlate this data with other data stored on Synapse SQL or in the data lake. If you are on a low budget and don’t need the added benefit of Azure Synapse, you may be better served with SQL Server, Azure Cognitive Search, or even Cosmos DB.

  • Scenario: I need to perform time-series analysis on metric data from sensors, social media, websites, financial transactions, or other fast streaming data.

Recommendation: You should use Azure Synapse Data Explorer. Data Explorer pools are optimized for application log and IoT device data and can ingest data at high volumes offering insights in near real time.

  • Scenario: I have data in a diverse schema, and with high volumes of data in near real time.

Recommendation: You should use Azure Synapse Data Explorer. Data Explorer pools are optimized for unstructured, semi-structured, and structured data and allow you to run interactive analytics on data of any shape.

  • Scenario: I need to correlate application logs or telemetry data from IoT devices with data sitting in a data warehouse and the data lake.

Recommendation: You should use Azure Synapse Data Explorer. By leveraging the SQL analytical pools in Azure Synapse (dedicated and serverless), you can use one tool to query all your data, regardless of the analytical store that holds it.

The rule of thumb is to think about Data Explorer pools when you are managing telemetry or log analytics data at scale. You should use it with Azure Synapse when you need to combine your analysis with data from other sources or use the added benefits of Azure Synapse in your project.