Book Image

CompTIA Data+: DAO-001 Certification Guide

By : Cameron Dodd
Book Image

CompTIA Data+: DAO-001 Certification Guide

By: Cameron Dodd

Overview of this book

The CompTIA Data+ certification exam not only helps validate a skill set required to enter one of the fastest-growing fields in the world, but also is starting to standardize the language and concepts within the field. However, there’s a lot of conflicting information and a lack of existing resources about the topics covered in this exam, and even professionals working in data analytics may need a study guide to help them pass on their first attempt. The CompTIA Data + (DAO-001) Certification Guide will give you a solid understanding of how to prepare, analyze, and report data for better insights. You’ll get an introduction to Data+ certification exam format to begin with, and then quickly dive into preparing data. You'll learn about collecting, cleaning, and processing data along with data wrangling and manipulation. As you progress, you’ll cover data analysis topics such as types of analysis, common techniques, hypothesis techniques, and statistical analysis, before tackling data reporting, common visualizations, and data governance. All the knowledge you've gained throughout the book will be tested with the mock tests that appear in the final chapters. By the end of this book, you’ll be ready to pass the Data+ exam with confidence and take the next step in your career.
Table of Contents (24 chapters)
1
Part 1: Preparing Data
7
Part 2: Analyzing Data
13
Part 3: Reporting Data
19
Part 4: Mock Exams

Introducing the exam domains

The exam was designed by a group of subject matter experts with different specialties in the field of data science. Together, they decided on common ground that any early career data analyst should know. They then categorized that knowledge into the following five domains:

  • Data Concepts and Environments
  • Data Mining
  • Data Analysis
  • Visualization
  • Data Governance, Quality, and Control

Data Concepts and Environments

The domains move through the data pipeline chronologically. The first domain, Data Concepts and Environments, is largely about how data is stored. This covers multiple levels, from different database types, structures, and schemas, through file types for specific kinds of data, and even into different variable types. This domain is a broad view of storage concepts mixed with the ability to identify what type of data you can expect from different storage solutions.

Data Mining

This domain is a bit of a misnomer. Data mining is when you already have a huge dataset and you just go through it to find any insights that might be of interest, instead of answering specific questions. While data mining, you must go through all the concepts contained within this domain, but you also go through all these concepts for regular data analysis. What this domain is actually about is every step after storing your data but before you run an analysis. This domain includes collecting, querying, cleaning, and wrangling data. Effectively, these are the steps you need to take to get your data into a useful shape so you can analyze it.

Data Analysis

You have stored your data, you have pulled your data and made it pretty, and now it is time to do something with it. This domain is all about analyses. You will be expected to perform descriptive statistical analyses, understand the concepts behind inferential statistics, be able to pick appropriate types of analysis, and even know some common tools used in the field. You don’t need to be able to use any of these tools because the test is vendor-neutral, just be able to identify them.

Visualization

It doesn’t matter how perfect your analyses are if you can’t communicate the results. What’s the point in coming up with an equation that solves world hunger if you can’t explain it to anyone else? To that end, the next domain is all about visualizations and reporting. This covers what information a report should include, what type of report is most appropriate, who should get a report, when reports should be delivered, the basics of report design, types of visualizations, and even the process of developing a dashboard.

Data Governance, Quality, and Control

The final domain is made up of larger concepts that span the entire life cycle of data analytics. A large part of this is made up of policies. Some of the policies focus on protected data and how it can be handled legally, while other policies are more about how you can ensure the quality of your data. If your data has low quality, you can’t trust anything it says, and if you are mishandling protected information, you could face legal penalties, so these are important factors to know. This domain also includes a short section on the concept of master data management, as an example of an ideal state.

Now that you know what domains will be covered on the certification exam, let’s talk about how the exam is structured.