Book Image

Principles of Data Science - Third Edition

By : Sinan Ozdemir
Book Image

Principles of Data Science - Third Edition

By: Sinan Ozdemir

Overview of this book

Principles of Data Science bridges mathematics, programming, and business analysis, empowering you to confidently pose and address complex data questions and construct effective machine learning pipelines. This book will equip you with the tools to transform abstract concepts and raw statistics into actionable insights. Starting with cleaning and preparation, you’ll explore effective data mining strategies and techniques before moving on to building a holistic picture of how every piece of the data science puzzle fits together. Throughout the book, you’ll discover statistical models with which you can control and navigate even the densest or the sparsest of datasets and learn how to create powerful visualizations that communicate the stories hidden in your data. With a focus on application, this edition covers advanced transfer learning and pre-trained models for NLP and vision tasks. You’ll get to grips with advanced techniques for mitigating algorithmic bias in data as well as models and addressing model and data drift. Finally, you’ll explore medium-level data governance, including data provenance, privacy, and deletion request handling. By the end of this data science book, you'll have learned the fundamentals of computational mathematics and statistics, all while navigating the intricacies of modern ML and large pre-trained models like GPT and BERT.
Table of Contents (18 chapters)

Basic symbols and terminology

In the following section, we will review the mathematical concepts of vectors, matrices, arithmetic symbols, and linear algebra, as well as some more subtle notations used by data scientists.

Vectors and matrices

A vector is defined as an object with both magnitude and direction. This definition, however, is a bit complicated. For our purpose, a vector is simply a one-dimensional array representing a series of numbers. Put another way, a vector is a list of numbers.

It is generally represented using an arrow or bold font, as shown here:

<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow><mrow><mover><mi>x</mi><mo stretchy="true">→</mo></mover><mi>o</mi><mi>r</mi><mi mathvariant="script">x</mi></mrow></mrow></math>

Vectors are broken into components, which are individual members of the vector. We use index notations to denote the element that we are referring to, as illustrated here:

<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow><mrow><mi>I</mi><mi>f</mi><mover><mi>x</mi><mo stretchy="true">→</mo></mover><mo>=</mo><mfenced open="(" close=")"><mtable columnwidth="auto" columnalign="center" rowspacing="1.0000ex 1.0000ex" rowalign="baseline baseline baseline"><mtr><mtd><mn>3</mn></mtd></mtr><mtr><mtd><mn>6</mn></mtd></mtr><mtr><mtd><mn>8</mn></mtd></mtr></mtable></mfenced><mi>t</mi><mi>h</mi><mi>e</mi><mi>n</mi><msub><mi mathvariant="script">x</mi><mn>1</mn></msub><mo>=</mo><mn>3</mn></mrow></mrow></math>

Note

In math, we generally refer to the first element as index 1, as opposed to computer science, where we generally refer to the first element as index 0. It is important to remember which index system you are using.

In Python...