Book Image

Data Science Algorithms in a Week

By : Dávid Natingga
Book Image

Data Science Algorithms in a Week

By: Dávid Natingga

Overview of this book

<p>Machine learning applications are highly automated and self-modifying, and they continue to improve over time with minimal human intervention as they learn with more data. To address the complex nature of various real-world data problems, specialized machine learning algorithms have been developed that solve these problems perfectly. Data science helps you gain new knowledge from existing data through algorithmic and statistical analysis.</p> <p>This book will address the problems related to accurate and efficient data classification and prediction. Over the course of 7 days, you will be introduced to seven algorithms, along with exercises that will help you learn different aspects of machine learning. You will see how to pre-cluster your data to optimize and classify it for large datasets. You will then find out how to predict data based on the existing trends in your datasets.</p> <p>This book covers algorithms such as: k-Nearest Neighbors, Naive Bayes, Decision Trees, Random Forest, k-Means, Regression, and Time-series. On completion of the book, you will understand which machine learning algorithm to pick for clustering, classification, or regression and which is best suited for your problem.</p>
Table of Contents (12 chapters)
11
Glossary of Algorithms and Methods in Data Science

Medical test - basic application of Bayes' theorem

A patient takes a special cancer test which has the accuracy test_accuracy=99.9%: if the result is positive, then 99.9% of the patients tested will suffer from the special type of cancer. 99.9% of the patients with a negative result do not suffer from the cancer.

Suppose that a patient is tested and scores positive on the test. What is the probability that a patient suffers from the special type of cancer?

Analysis:

We will use Bayes' theorem to find out the probability of the patient having the cancer:

P(cancer|test_positive)=(P(test_positive|cancer) * P(cancer))/P(test_positive)

To know the prior probability that a patient has the cancer, we have to find out how frequently the cancer occurs among people. Say that we find out that 1 person in 100,000 suffers from this kind of cancer. Then P(cancer)=1/100,000. So, P...