Book Image

Data Science Algorithms in a Week

By : Dávid Natingga

Book Image

Data Science Algorithms in a Week

By: Dávid Natingga

Overview of this book

<p>Machine learning applications are highly automated and self-modifying, and they continue to improve over time with minimal human intervention as they learn with more data. To address the complex nature of various real-world data problems, specialized machine learning algorithms have been developed that solve these problems perfectly. Data science helps you gain new knowledge from existing data through algorithmic and statistical analysis.</p> <p>This book will address the problems related to accurate and efficient data classification and prediction. Over the course of 7 days, you will be introduced to seven algorithms, along with exercises that will help you learn different aspects of machine learning. You will see how to pre-cluster your data to optimize and classify it for large datasets. You will then find out how to predict data based on the existing trends in your datasets.</p> <p>This book covers algorithms such as: k-Nearest Neighbors, Naive Bayes, Decision Trees, Random Forest, k-Means, Regression, and Time-series. On completion of the book, you will understand which machine learning algorithm to pick for clustering, classification, or regression and which is best suited for your problem.</p>

Preface

What this book covers

What you need for this book

Who this book is for

Reader feedback

Customer support

Free Chapter

Classification Using K Nearest Neighbors

Classification Using K Nearest Neighbors

Mary and her temperature preferences

Implementation of k-nearest neighbors algorithm

Map of Italy example - choosing the value of k

House ownership - data rescaling

Text classification - using non-Euclidean distances

Text classification - k-NN in higher-dimensions

Naive Bayes

Medical test - basic application of Bayes' theorem

Proof of Bayes' theorem and its extension

Playing chess - independent events

Implementation of naive Bayes classifier

Playing chess - dependent events

Gender classification - Bayes for continuous random variables

Decision Trees

Swim preference - representing data with decision tree

Information theory

ID3 algorithm - decision tree construction

Classifying with a decision tree

Playing chess - analysis with decision tree

Going shopping - dealing with data inconsistency

Random Forest

Overview of random forest algorithm

Swim preference - analysis with random forest

Implementation of random forest algorithm

Playing chess example

Going shopping - overcoming data inconsistency with randomness and measuring the level of confidence

Clustering into K Clusters

Clustering into K Clusters

Household incomes - clustering into k clusters

Gender classification - clustering to classify

Implementation of the k-means clustering algorithm

House ownership – choosing the number of clusters

Document clustering – understanding the number of clusters k in a semantic context

Regression

Fahrenheit and Celsius conversion - linear regression on perfect data

Weight prediction from height - linear regression on real-world data

Gradient descent algorithm and its implementation

Flight time duration prediction from distance

Ballistic flight analysis – non-linear model

Time Series Analysis

Time Series Analysis

Business profit - analysis of the trend

Electronics shop's sales - analysis of seasonality

Statistics

Bayesian Inference

Cross-validation

R Reference

Linear regression

Python Reference

Python Reference

Glossary of Algorithms and Methods in Data Science

Glossary of Algorithms and Methods in Data Science

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Implementation of naive Bayes classifier

We implement a program calculating the probability of a data item belonging to a certain class using Bayes' theorem:

# source_code/2/naive_bayes.py 
# A program that reads the CSV file with the data and returns
# the Bayesian probability for the unknown value denoted by ? to
# belong to a certain class.
# An input CSV file should be of the following format:
# 1. items in a row should be separated by a comma ','
# 2. the first row should be a heading - should contain a name for each
# column of the data.
# 3. the remaining rows should contain the data itself - rows with
# complete and rows with the incomplete data.
# A row with complete data is the row that has a non-empty and
# non-question mark value for each column. A row with incomplete data is
# the row that has the last column with the value of a question mark ?.
# Please...