Machine Learning Fundamentals

Machine Learning Fundamentals

By : Hyatt Saleh

Buy this Book

Machine Learning Fundamentals

By: Hyatt Saleh

Buy this Book

Overview of this book

As machine learning algorithms become popular, new tools that optimize these algorithms are also developed. Machine Learning Fundamentals explains you how to use the syntax of scikit-learn. You'll study the difference between supervised and unsupervised models, as well as the importance of choosing the appropriate algorithm for each dataset. You'll apply unsupervised clustering algorithms over real-world datasets, to discover patterns and profiles, and explore the process to solve an unsupervised machine learning problem. The focus of the book then shifts to supervised learning algorithms. You'll learn to implement different supervised algorithms and develop neural network structures using the scikit-learn package. You'll also learn how to perform coherent result analysis to improve the performance of the algorithm by tuning hyperparameters. By the end of this book, you will have gain all the skills required to start programming machine learning algorithms.

Machine Learning Fundamentals

Preface

Free Chapter

Introduction to Scikit-Learn

Supervised and Unsupervised Learning

Summary

Unsupervised Learning: Real-Life Applications

Introduction

Clustering

Exploring a Dataset: Wholesale Customers Dataset

Evaluating the Performance of Clusters

Summary

Supervised Learning: Key Steps

Introduction

Model Validation and Testing

Evaluation Metrics

Error Analysis

Summary

Supervised Learning Algorithms: Predict Annual Income

Introduction

Exploring the Dataset

Naïve Bayes Algorithm

Decision Tree Algorithm

Support Vector Machine Algorithm

Error Analysis

Summary

Artificial Neural Networks: Predict Annual Income

Introduction

Artificial Neural Networks

Applying an Artificial Neural Network

Performance Analysis

Summary

Building Your Own Program

Introduction

Program Definition

Saving and Loading a Trained Model

Interacting with a Trained Model

Summary

Appendix

Chapter 1: Introduction to scikit-learn

Chapter 2: Unsupervised Learning: Real-life Applications

Chapter 3: Supervised Learning: Key Steps

Chapter 4: Supervised Learning Algorithms: Predict Annual Income

Chapter 5: Artificial Neural Networks: Predict Annual Income

Chapter 6: Building Your Own Program

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Chapter 4: Supervised Learning Algorithms: Predict Annual Income

Activity 11: Training a Naïve Bayes Model for our Census Income Dataset

Before working on step 1, make sure that the data has been preprocessed, as follows:

import pandas as pd
data = pd.read_csv("datasets/census_income_dataset.csv")
data = data.drop(["fnlwgt","education","relationship","sex", "race"], axis=1)

After reading the dataset, the three variables considered irrelevant for the study are removed.

Next, the remaining qualitative variables are converted into their numerical form via the following code:

from sklearn.preprocessing import LabelEncoder
enc = LabelEncoder()

features_to_convert = ["workclass","marital-status","occupation","native-country","target"]

for i in features_to_convert:
  data[i] = enc.fit_transform(data[i].astype('str'))

Once this is complete, you can begin with the steps of the activity:

Using the preprocessed Census Income Dataset, separate the features from the target by creating the variables X and Y:
```
X = data.drop("target", axis=1)
Y = data["target"]
```
Note that there are several ways to achieve the separation of X and Y. Use the one that you feel most comfortable with. However, take into account that X should contain the features for all instances, while Y should contain the class label of all instances.

Divide the dataset into training, validation, and testing sets, using a split ratio of 10%:

from sklearn.model_selection import train_test_split

X_new, X_test, Y_new, Y_test = train_test_split(X, Y, test_size=0.1, random_state=101)

X_train, X_dev, Y_train, Y_dev = train_test_split(X_new, Y_new, test_size=0.12, random_state=101)

The final shape of all sets must match the values shown in the following code:

X_train = (26048, 9)
Y_train = (26048, )
X_dev = (3256, 9)
Y_dev = (3256, )
X_test = (3257, 9)
Y_test = (3257, )

Import the Gaussian Naïve Bayes class, and then use the fit method to train the model over the training sets (X_train and Y_train):
```
from sklearn.naive_bayes import GaussianNB

model_NB = GaussianNB()
model_NB.fit(X_train,Y_train) 
```
Finally, perform a prediction using the model that you trained previously for a new instance with the following values for each feature: 39, 6, 13, 4, 0, 2174, 0, 40, 38.
Using the following code, the prediction for the individual should be equal to zero, which means that the individual most likely has an income below or equal to 50K:
```
pred_1 = model_NB.predict([[39,6,13,4,0,4,1,2174,0,40,38]])
```
```
print(pred_1)
```

Activity 12: Training a Decision Tree Model for our Census Income Dataset

The shape of the previously created subsets must be as follows:

X_train = (26048, 11)
Y_train = (26048, 1)
X_dev = (3256, 11)
Y_dev = (3256, 1)
X_test = (3257, 11)
Y_test = (3257, 1)

Using the preprocessed Census Income Dataset that was previously split into the different subsets, import the DecisionTreeClassifier class, and then use the fit method to train the model over the training sets (X_train and Y_train):
```
from sklearn.tree import DecisionTreeClassifier

model_tree = DecisionTreeClassifier()
model_tree.fit(X_train,Y_train) 
```
Finally, perform a prediction using the model that you trained before for a new instance with the following values for each feature: 39, 6, 13, 4, 0, 2174, 0, 40, 38.
Using the following code, the prediction for the individual should be equal to zero, which means that the individual most likely has an income below or equal to 50K:
```
pred_2 = model_tree.predict([[39,6,13,4,0,4,1,2174,0,40,38]])
print(pred_2)
```

Activity 13: Training a SVM Model for our Census Income Dataset

The shape of the previously created subsets must be as follows:

X_train = (26048, 11)
Y_train = (26048, 1)
X_dev = (3256, 11)
Y_dev = (3256, 1)
X_test = (3257, 11)
Y_test = (3257, 1)

Using the preprocessed Census Income Dataset that was previously split into the different subsets, import the SVC class, and then use the fit method to train the model over the training sets (X_train and Y_train):
```
from sklearn.svm import SVC

model_svm = SVC()
model_svm.fit(X_train,Y_train)
```
Finally, perform a prediction using the model that you trained before for a new instance with the following values for each feature: 39, 6, 13, 4, 0, 2174, 0, 40, 38.
Using the following code, the prediction for the individual should be equal to zero, which means that the individual most likely has an income below or equal to 50K:
```
pred_3 = model_svm.predict([[39,6,13,4,0,4,1,2174,0,40,38]])
print(pred_3)
```

Machine Learning Fundamentals

By : Hyatt Saleh

Machine Learning Fundamentals

By: Hyatt Saleh

Overview of this book

Related Content you might be interested in

Current Title:

Machine Learning Fundamentals

Applied Deep Learning with PyTorch

The Deep Learning with PyTorch Workshop

Machine Learning with scikit-learn Quick Start Guide

Chapter 4: Supervised Learning Algorithms: Predict Annual Income

Activity 11: Training a Naïve Bayes Model for our Census Income Dataset

Activity 12: Training a Decision Tree Model for our Census Income Dataset

Activity 13: Training a SVM Model for our Census Income Dataset