Book Image

Data Science Algorithms in a Week

By : Dávid Natingga

Book Image

Data Science Algorithms in a Week

By: Dávid Natingga

Overview of this book

<p>Machine learning applications are highly automated and self-modifying, and they continue to improve over time with minimal human intervention as they learn with more data. To address the complex nature of various real-world data problems, specialized machine learning algorithms have been developed that solve these problems perfectly. Data science helps you gain new knowledge from existing data through algorithmic and statistical analysis.</p> <p>This book will address the problems related to accurate and efficient data classification and prediction. Over the course of 7 days, you will be introduced to seven algorithms, along with exercises that will help you learn different aspects of machine learning. You will see how to pre-cluster your data to optimize and classify it for large datasets. You will then find out how to predict data based on the existing trends in your datasets.</p> <p>This book covers algorithms such as: k-Nearest Neighbors, Naive Bayes, Decision Trees, Random Forest, k-Means, Regression, and Time-series. On completion of the book, you will understand which machine learning algorithm to pick for clustering, classification, or regression and which is best suited for your problem.</p>

Preface

What this book covers

What you need for this book

Who this book is for

Reader feedback

Customer support

Free Chapter

Classification Using K Nearest Neighbors

Classification Using K Nearest Neighbors

Mary and her temperature preferences

Implementation of k-nearest neighbors algorithm

Map of Italy example - choosing the value of k

House ownership - data rescaling

Text classification - using non-Euclidean distances

Text classification - k-NN in higher-dimensions

Naive Bayes

Medical test - basic application of Bayes' theorem

Proof of Bayes' theorem and its extension

Playing chess - independent events

Implementation of naive Bayes classifier

Playing chess - dependent events

Gender classification - Bayes for continuous random variables

Decision Trees

Swim preference - representing data with decision tree

Information theory

ID3 algorithm - decision tree construction

Classifying with a decision tree

Playing chess - analysis with decision tree

Going shopping - dealing with data inconsistency

Random Forest

Overview of random forest algorithm

Swim preference - analysis with random forest

Implementation of random forest algorithm

Playing chess example

Going shopping - overcoming data inconsistency with randomness and measuring the level of confidence

Clustering into K Clusters

Clustering into K Clusters

Household incomes - clustering into k clusters

Gender classification - clustering to classify

Implementation of the k-means clustering algorithm

House ownership – choosing the number of clusters

Document clustering – understanding the number of clusters k in a semantic context

Regression

Fahrenheit and Celsius conversion - linear regression on perfect data

Weight prediction from height - linear regression on real-world data

Gradient descent algorithm and its implementation

Flight time duration prediction from distance

Ballistic flight analysis – non-linear model

Time Series Analysis

Time Series Analysis

Business profit - analysis of the trend

Electronics shop's sales - analysis of seasonality

Statistics

Bayesian Inference

Cross-validation

R Reference

Linear regression

Python Reference

Python Reference

Glossary of Algorithms and Methods in Data Science

Glossary of Algorithms and Methods in Data Science

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Summary

A random forest is a set of decision trees where each tree is constructed from a sample chosen randomly from the initial data. This process is called bootstrap aggregating. Its purpose is to reduce variance and bias in the classification made by a random forest. The bias is further reduced during a construction of a decision tree by considering only a random subset of the variables for each branch of the tree.

Once a random forest is constructed, the result of the classification of a random forest is the majority vote from among all the trees in a random forest. The level of the majority also determines the amount of the confidence that the answer is correct.

Since a random forest consists of decision trees, it is good to use it for every problem where a decision tree is a good choice. Since a random forest reduces bias and variance that exist in a decision tree classifier...