Book Image

Principles of Data Science - Second Edition

By : Sinan Ozdemir, Sunil Kakade, Marco Tibaldeschi
Book Image

Principles of Data Science - Second Edition

By: Sinan Ozdemir, Sunil Kakade, Marco Tibaldeschi

Overview of this book

Need to turn programming skills into effective data science skills? This book helps you connect mathematics, programming, and business analysis. You’ll feel confident asking—and answering—complex, sophisticated questions of your data, making abstract and raw statistics into actionable ideas. Going through the data science pipeline, you'll clean and prepare data and learn effective data mining strategies and techniques to gain a comprehensive view of how the data science puzzle fits together. You’ll learn fundamentals of computational mathematics and statistics and pseudo-code used by data scientists and analysts. You’ll learn machine learning, discovering statistical models that help control and navigate even the densest datasets, and learn powerful visualizations that communicate what your data means.
Table of Contents (17 chapters)
16
Index

Summary

Probability as a field works to explain our random and chaotic world. Using the basic laws of probability, we can model real-life events that involve randomness. We can use random variables to represent values that may take on several values, and we can use the probability mass or density functions to compare product lines or look at the test results.

We have seen some of the more complicated uses of probability in prediction. Using random variables and Bayes' theorem are excellent ways to assign probabilities to real-life situations. In later chapters, we will revisit Bayes' theorem and use it to create a very powerful and fast machine learning algorithm, called the Naïve Bayes algorithm. This algorithm captures the power of Bayesian thinking and applies it directly to the problem of predictive learning.

The next two chapters are focused on statistical thinking. Like probability, these chapters will use mathematical formulas to model real-world events. The main difference...