Book Image

Machine Learning Fundamentals

By : Hyatt Saleh
Book Image

Machine Learning Fundamentals

By: Hyatt Saleh

Overview of this book

As machine learning algorithms become popular, new tools that optimize these algorithms are also developed. Machine Learning Fundamentals explains you how to use the syntax of scikit-learn. You'll study the difference between supervised and unsupervised models, as well as the importance of choosing the appropriate algorithm for each dataset. You'll apply unsupervised clustering algorithms over real-world datasets, to discover patterns and profiles, and explore the process to solve an unsupervised machine learning problem. The focus of the book then shifts to supervised learning algorithms. You'll learn to implement different supervised algorithms and develop neural network structures using the scikit-learn package. You'll also learn how to perform coherent result analysis to improve the performance of the algorithm by tuning hyperparameters. By the end of this book, you will have gain all the skills required to start programming machine learning algorithms.
Table of Contents (9 chapters)
Machine Learning Fundamentals
Preface

DBSCAN Algorithm


The density-based spatial clustering of applications with noise (DBSCAN) algorithm groups together points that are close to each other (with many neighbors) and marks those points that are further away with no close neighbors as outliers.

According to this, and as its name states, the algorithm classifies data points based on the density of all data points in the data space.

Understanding the Algorithm

The DBSCAN algorithm requires two main parameters: epsilon and the minimum number of observations.

Epsilon, also known as eps, is the maximum distance that defines the radius within which the algorithm searches for neighbors. The minimum number of observations, on the other hand, refers to the number of data points required to form a high density area (min_samples). However, the latter is optional in scikit-learn as the default value is set to 5:

Figure 2.13: An illustration of how the DBSCAN algorithm classifies data into clusters

In the preceding diagram, the blue dots are assigned...