Book Image

Data Science Algorithms in a Week

By : Dávid Natingga

Book Image

Data Science Algorithms in a Week

By: Dávid Natingga

Overview of this book

<p>Machine learning applications are highly automated and self-modifying, and they continue to improve over time with minimal human intervention as they learn with more data. To address the complex nature of various real-world data problems, specialized machine learning algorithms have been developed that solve these problems perfectly. Data science helps you gain new knowledge from existing data through algorithmic and statistical analysis.</p> <p>This book will address the problems related to accurate and efficient data classification and prediction. Over the course of 7 days, you will be introduced to seven algorithms, along with exercises that will help you learn different aspects of machine learning. You will see how to pre-cluster your data to optimize and classify it for large datasets. You will then find out how to predict data based on the existing trends in your datasets.</p> <p>This book covers algorithms such as: k-Nearest Neighbors, Naive Bayes, Decision Trees, Random Forest, k-Means, Regression, and Time-series. On completion of the book, you will understand which machine learning algorithm to pick for clustering, classification, or regression and which is best suited for your problem.</p>

Preface

What this book covers

What you need for this book

Who this book is for

Reader feedback

Customer support

Free Chapter

Classification Using K Nearest Neighbors

Classification Using K Nearest Neighbors

Mary and her temperature preferences

Implementation of k-nearest neighbors algorithm

Map of Italy example - choosing the value of k

House ownership - data rescaling

Text classification - using non-Euclidean distances

Text classification - k-NN in higher-dimensions

Naive Bayes

Medical test - basic application of Bayes' theorem

Proof of Bayes' theorem and its extension

Playing chess - independent events

Implementation of naive Bayes classifier

Playing chess - dependent events

Gender classification - Bayes for continuous random variables

Decision Trees

Swim preference - representing data with decision tree

Information theory

ID3 algorithm - decision tree construction

Classifying with a decision tree

Playing chess - analysis with decision tree

Going shopping - dealing with data inconsistency

Random Forest

Overview of random forest algorithm

Swim preference - analysis with random forest

Implementation of random forest algorithm

Playing chess example

Going shopping - overcoming data inconsistency with randomness and measuring the level of confidence

Clustering into K Clusters

Clustering into K Clusters

Household incomes - clustering into k clusters

Gender classification - clustering to classify

Implementation of the k-means clustering algorithm

House ownership – choosing the number of clusters

Document clustering – understanding the number of clusters k in a semantic context

Regression

Fahrenheit and Celsius conversion - linear regression on perfect data

Weight prediction from height - linear regression on real-world data

Gradient descent algorithm and its implementation

Flight time duration prediction from distance

Ballistic flight analysis – non-linear model

Time Series Analysis

Time Series Analysis

Business profit - analysis of the trend

Electronics shop's sales - analysis of seasonality

Statistics

Bayesian Inference

Cross-validation

R Reference

Linear regression

Python Reference

Python Reference

Glossary of Algorithms and Methods in Data Science

Glossary of Algorithms and Methods in Data Science

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Classifying with a decision tree

Once we have constructed a decision tree from the data with the attributes A₁, ..., A_m and the classes {c₁, ..., c_k}, we can use this decision tree to classify a new data item with the attributes A₁, ..., A_m into one of the classes {c₁, ..., c_k}.

Given a new data item that we would like to classify, we can think of each node including the root as a question for data sample: What value does that data sample for the selected attribute A_i have? Then based on the answer, we select the branch of a decision tree and move further to the next node. Then another question is answered about the data sample and another until the data sample reaches the leaf node. A leaf node has an associated one of the classes {c₁, ..., c_k} with it; for example, c_i. Then the decision tree algorithm would classify the data sample into the class c_i.

...