Book Image

Azure Data Engineer Associate Certification Guide

By : Newton Alex
Book Image

Azure Data Engineer Associate Certification Guide

By: Newton Alex

Overview of this book

Azure is one of the leading cloud providers in the world, providing numerous services for data hosting and data processing. Most of the companies today are either cloud-native or are migrating to the cloud much faster than ever. This has led to an explosion of data engineering jobs, with aspiring and experienced data engineers trying to outshine each other. Gaining the DP-203: Azure Data Engineer Associate certification is a sure-fire way of showing future employers that you have what it takes to become an Azure Data Engineer. This book will help you prepare for the DP-203 examination in a structured way, covering all the topics specified in the syllabus with detailed explanations and exam tips. The book starts by covering the fundamentals of Azure, and then takes the example of a hypothetical company and walks you through the various stages of building data engineering solutions. Throughout the chapters, you'll learn about the various Azure components involved in building the data systems and will explore them using a wide range of real-world use cases. Finally, you’ll work on sample questions and answers to familiarize yourself with the pattern of the exam. By the end of this Azure book, you'll have gained the confidence you need to pass the DP-203 exam with ease and land your dream job in data engineering.
Table of Contents (23 chapters)
1
Part 1: Azure Basics
3
Part 2: Data Storage
10
Part 3: Design and Develop Data Processing (25-30%)
15
Part 4: Design and Implement Data Security (10-15%)
17
Part 5: Monitor and Optimize Data Storage and Data Processing (10-15%)
20
Part 6: Practice Exercises

Cleansing data

Cleansing data is an important activity in any data pipeline. As data flows in from various sources, there are chances that the data won't adhere to the schemas and there might be missing values, non-standard entries, duplicates, and so on. During the cleansing phase, we try to correct such anomalies. Let's look at a few common data cleansing techniques.

Handling missing/null values

We can handle missing or Null values in multiple ways. We can choose to filter out such rows, substitute missing values with default values, or substitute them with some meaningful values such as mean, median, average, and so on. Let's look at an example of how we can replace missing values with some default string such as 'NA'.

Substituting with default values

Substituting with default values can be achieved using the Derived columns transformation, as shown in the following screenshot:

Figure 8.17 – Substituting with...