Book Image

Machine Learning Fundamentals

By : Hyatt Saleh
Book Image

Machine Learning Fundamentals

By: Hyatt Saleh

Overview of this book

As machine learning algorithms become popular, new tools that optimize these algorithms are also developed. Machine Learning Fundamentals explains you how to use the syntax of scikit-learn. You'll study the difference between supervised and unsupervised models, as well as the importance of choosing the appropriate algorithm for each dataset. You'll apply unsupervised clustering algorithms over real-world datasets, to discover patterns and profiles, and explore the process to solve an unsupervised machine learning problem. The focus of the book then shifts to supervised learning algorithms. You'll learn to implement different supervised algorithms and develop neural network structures using the scikit-learn package. You'll also learn how to perform coherent result analysis to improve the performance of the algorithm by tuning hyperparameters. By the end of this book, you will have gain all the skills required to start programming machine learning algorithms.
Table of Contents (9 chapters)
Machine Learning Fundamentals
Preface

Clustering


Clustering is a type of unsupervised machine-learning technique, where the objective is to arrive at conclusions based on the patterns found within unlabeled input data. This technique is mainly used to find meaning in the structure of large data in order to draw decisions.

For instance, from a large list of restaurants in a city, it would be useful to segregate the market into subgroups based on the type of food, quantity of clients, and style of experience to offer each cluster a service that's been configured to its specific needs.

Moreover, clustering algorithms divide the data points into n number of clusters so that the data points in the same cluster have similar features, whereas they greatly differ from the data points in other clusters.

Clustering Types

Clustering algorithms can classify data points using a methodology that is either hard or soft. The former designates data points completely to a cluster, whereas the latter method calculates for each data point the probability...