Book Image

Learn Azure Synapse Data Explorer

By : Pericles (Peri) Rocha
Book Image

Learn Azure Synapse Data Explorer

By: Pericles (Peri) Rocha

Overview of this book

Large volumes of data are generated daily from applications, websites, IoT devices, and other free-text, semi-structured data sources. Azure Synapse Data Explorer helps you collect, store, and analyze such data, and work with other analytical engines, such as Apache Spark, to develop advanced data science projects and maximize the value you extract from data. This book offers a comprehensive view of Azure Synapse Data Explorer, exploring not only the core scenarios of Data Explorer but also how it integrates within Azure Synapse. From data ingestion to data visualization and advanced analytics, you’ll learn to take an end-to-end approach to maximize the value of unstructured data and drive powerful insights using data science capabilities. With real-world usage scenarios, you’ll discover how to identify key projects where Azure Synapse Data Explorer can help you achieve your business goals. Throughout the chapters, you'll also find out how to manage big data as part of a software as a service (SaaS) platform, as well as tune, secure, and serve data to end users. By the end of this book, you’ll have mastered the big data life cycle and you'll be able to implement advanced analytical scenarios from raw telemetry and log data.
Table of Contents (19 chapters)
1
Part 1 Introduction to Azure Synapse Data Explorer
6
Part 2 Working with Data
12
Part 3 Managing Azure Synapse Data Explorer

Speeding up queries using cache policies

As you have seen in Chapter 1, Introducing Azure Synapse Data Explorer, Data Explorer pools can manage very large amounts of data. They separate the compute layer from the storage layer, allowing you to scale massively in storage, regardless of how much compute you have allocated in your Data Explorer pool.

When dealing with large volumes of data, it’s useful to understand what data you need readily available as needed, and what data can be stored as an archive, meaning that it is still available but maybe at a cheaper location that is slower to retrieve. This is the concept of hot data and cold data. Data accessed frequently is designated as hot data and should be quick to retrieve. Data that is less frequently accessed but is still needed is designated as cold data, and can typically be stored in cheaper storage that is still reliable but slower to retrieve. The implication here is not only on performance but also on cost: cold storage...