Book Image

Learn Azure Synapse Data Explorer

By : Pericles (Peri) Rocha
Book Image

Learn Azure Synapse Data Explorer

By: Pericles (Peri) Rocha

Overview of this book

Large volumes of data are generated daily from applications, websites, IoT devices, and other free-text, semi-structured data sources. Azure Synapse Data Explorer helps you collect, store, and analyze such data, and work with other analytical engines, such as Apache Spark, to develop advanced data science projects and maximize the value you extract from data. This book offers a comprehensive view of Azure Synapse Data Explorer, exploring not only the core scenarios of Data Explorer but also how it integrates within Azure Synapse. From data ingestion to data visualization and advanced analytics, you’ll learn to take an end-to-end approach to maximize the value of unstructured data and drive powerful insights using data science capabilities. With real-world usage scenarios, you’ll discover how to identify key projects where Azure Synapse Data Explorer can help you achieve your business goals. Throughout the chapters, you'll also find out how to manage big data as part of a software as a service (SaaS) platform, as well as tune, secure, and serve data to end users. By the end of this book, you’ll have mastered the big data life cycle and you'll be able to implement advanced analytical scenarios from raw telemetry and log data.
Table of Contents (19 chapters)
1
Part 1 Introduction to Azure Synapse Data Explorer
6
Part 2 Working with Data
12
Part 3 Managing Azure Synapse Data Explorer

The need for a fast and highly scalable data exploration service

Data warehouses, and SQL-based databases, have reached a level of maturity where the technologies are stable, widely available from a variety of vendors, and popularly adopted by enterprises. Structured databases are efficiently stored, and queries are resolved by using techniques such as indexing and materialized views (among other techniques) to quickly retrieve the data requested by the user.

Unstructured data, however, does not have a pre-defined schema, or structure. Storing unstructured data optimally is challenging, as data pages cannot be calculated in advance the way they are in typical SQL databases. The same challenges apply to the processing and querying of unstructured data.

Application logs and IoT device data are good examples of unstructured data that is produced at low latency. They are text-heavy but without pre-defined text sizes. An application log can not only contain clickstreams, user feedback, and error messages, but also dates and device identifiers (IDs). IoT device data may include facts such as a count of objects scanned and measures, but also barcode numbers, descriptive text, coordinates, and more.

This is all high-value data that companies now realize can be useful to improve products and respond quickly to market changes and user feedback. Therefore, being able to efficiently store, process, query, and maintain unstructured data is a real requirement for companies of all sizes. But managing big data by itself is not enough—we need the means to efficiently acquire, manage, explore, model, and serve data to end users. In short, we need to realize the full data lifecycle to unlock insights and maximize the value of data. On top of that, we need to make sure that your company’s data, being such a valuable asset, is well protected from unauthorized access, and that the analytical environment adheres to mission-critical requirements imposed by enterprises. Let us now look at how Azure Synapse helps address these needs.