Book Image

Principles of Data Science - Third Edition

By : Sinan Ozdemir
Book Image

Principles of Data Science - Third Edition

By: Sinan Ozdemir

Overview of this book

Principles of Data Science bridges mathematics, programming, and business analysis, empowering you to confidently pose and address complex data questions and construct effective machine learning pipelines. This book will equip you with the tools to transform abstract concepts and raw statistics into actionable insights. Starting with cleaning and preparation, you’ll explore effective data mining strategies and techniques before moving on to building a holistic picture of how every piece of the data science puzzle fits together. Throughout the book, you’ll discover statistical models with which you can control and navigate even the densest or the sparsest of datasets and learn how to create powerful visualizations that communicate the stories hidden in your data. With a focus on application, this edition covers advanced transfer learning and pre-trained models for NLP and vision tasks. You’ll get to grips with advanced techniques for mitigating algorithmic bias in data as well as models and addressing model and data drift. Finally, you’ll explore medium-level data governance, including data provenance, privacy, and deletion request handling. By the end of this data science book, you'll have learned the fundamentals of computational mathematics and statistics, all while navigating the intricacies of modern ML and large pre-trained models like GPT and BERT.
Table of Contents (18 chapters)

Types of Data

For our first step into the world of data science, let’s take a look at the various ways in which data can be formed. In this chapter, we will explore three critical categorizations of data:

  • Structured versus unstructured data
  • Quantitative versus qualitative data
  • The four levels of data

We will dive further into each of these topics by showing examples of how data scientists look at and work with data. This chapter aims to familiarize us with the fundamental types of data so that when we eventually see our first dataset, we will know exactly how to dissect, diagnose, and analyze the contents to maximize our insight value and machine learning performance.

The first thing to note is my use of the word data. In the previous chapter, I defined data as merely a collection of information. This vague definition exists because we may separate data into different categories and need our definition to be loose.

The next thing to remember while...