Book Image

Principles of Data Science - Second Edition

By : Sinan Ozdemir, Sunil Kakade, Marco Tibaldeschi
Book Image

Principles of Data Science - Second Edition

By: Sinan Ozdemir, Sunil Kakade, Marco Tibaldeschi

Overview of this book

Need to turn programming skills into effective data science skills? This book helps you connect mathematics, programming, and business analysis. You’ll feel confident asking—and answering—complex, sophisticated questions of your data, making abstract and raw statistics into actionable ideas. Going through the data science pipeline, you'll clean and prepare data and learn effective data mining strategies and techniques to gain a comprehensive view of how the data science puzzle fits together. You’ll learn fundamentals of computational mathematics and statistics and pseudo-code used by data scientists and analysts. You’ll learn machine learning, discovering statistical models that help control and navigate even the densest datasets, and learn powerful visualizations that communicate what your data means.
Table of Contents (17 chapters)
16
Index

Naive Bayes classification

Let's get right into it! Let's begin with Naive Bayes classification. This machine learning model relies heavily on the results from the previous chapters, specifically with Bayes' theorem:

Naive Bayes classification

Let's look a little closer at the specific features of this formula:

  • P(H) is the probability of the hypothesis before we observe the data, called the prior probability, or just prior
  • P(H|D) is what we want to compute, the probability of the hypothesis after we observe the data, called the posterior
  • P(D|H) is the probability of the data under the given hypothesis, called the likelihood
  • P(D) is the probability of the data under any hypothesis, called the normalizing constant

Naive Bayes classification is a classification model, and therefore a supervised model. Given this, what kind of data do we need?

  • Labeled data
  • Unlabeled data

(Insert Jeopardy music here)

If you answered labeled data, then you're well on your way to becoming a data scientist!

Suppose we have...