Book Image

Principles of Data Science - Third Edition

By : Sinan Ozdemir
Book Image

Principles of Data Science - Third Edition

By: Sinan Ozdemir

Overview of this book

Principles of Data Science bridges mathematics, programming, and business analysis, empowering you to confidently pose and address complex data questions and construct effective machine learning pipelines. This book will equip you with the tools to transform abstract concepts and raw statistics into actionable insights. Starting with cleaning and preparation, you’ll explore effective data mining strategies and techniques before moving on to building a holistic picture of how every piece of the data science puzzle fits together. Throughout the book, you’ll discover statistical models with which you can control and navigate even the densest or the sparsest of datasets and learn how to create powerful visualizations that communicate the stories hidden in your data. With a focus on application, this edition covers advanced transfer learning and pre-trained models for NLP and vision tasks. You’ll get to grips with advanced techniques for mitigating algorithmic bias in data as well as models and addressing model and data drift. Finally, you’ll explore medium-level data governance, including data provenance, privacy, and deletion request handling. By the end of this data science book, you'll have learned the fundamentals of computational mathematics and statistics, all while navigating the intricacies of modern ML and large pre-trained models like GPT and BERT.
Table of Contents (18 chapters)

Summary

In this chapter, we explored basic data science terminology and saw how even the term “data science” can be fraught with ambiguity and misconception. We also learned that coding, math, and domain expertise are the fundamental building blocks of data science. As we seek new and innovative ways to discover data trends, a beast lurks in the shadows. I’m not talking about the learning curve of mathematics or programming, nor am I referring to the surplus of data. The Industrial Age left us with an ongoing battle against pollution. The subsequent Information Age left behind a trail of big data. So, what dangers might the Data Age bring us?

The Data Age can lead to something much more sinister – the dehumanization of the individual through mass data and the rise of automated bias in machine learning systems.

More and more people are jumping headfirst into the field of data science, most with no prior experience in math or computer science, which, on the surface, is great. The average data scientist has access to millions of dating profiles’ data, tweets, online reviews, and much more to jump-start their education. However, if you jump into data science without the proper exposure to theory or coding practices, and without respect for the domain you are working in, you face the risk of oversimplifying the very phenomenon you are trying to model.

Now, it’s time to begin. In the next chapter, we will explore the different types of data that exist in the world, ranging from free-form text to highly structured row/column files. We will also look at the mathematical operations that are allowed for different types of data, as well as deduce insights based on the form the data comes in.