#### Overview of this book

As machine learning algorithms become popular, new tools that optimize these algorithms are also developed. Machine Learning Fundamentals explains you how to use the syntax of scikit-learn. You'll study the difference between supervised and unsupervised models, as well as the importance of choosing the appropriate algorithm for each dataset. You'll apply unsupervised clustering algorithms over real-world datasets, to discover patterns and profiles, and explore the process to solve an unsupervised machine learning problem. The focus of the book then shifts to supervised learning algorithms. You'll learn to implement different supervised algorithms and develop neural network structures using the scikit-learn package. You'll also learn how to perform coherent result analysis to improve the performance of the algorithm by tuning hyperparameters. By the end of this book, you will have gain all the skills required to start programming machine learning algorithms.
Machine Learning Fundamentals
Preface
Free Chapter
Introduction to Scikit-Learn
Unsupervised Learning: Real-Life Applications
Supervised Learning: Key Steps
Supervised Learning Algorithms: Predict Annual Income
Artificial Neural Networks: Predict Annual Income

## k-means Algorithm

The k-means algorithm is used for data without a labeled class. It involves dividing the data into K number of subgroups. The classification of data points into each group is done based on similarity, as explained before, which for this algorithm is measured by the distance from the center (centroid) of the cluster. The final output of the algorithm are the data points related to a cluster and the centroid of each cluster, which can be used to label new data in the same clusters.

The centroid of each cluster represents a collection of features that can be used to define the nature of the data points that belong there.

### Understanding the Algorithm

The k-means algorithm works through an iterative process that involves the following steps:

Figure 2.6: A formula minimizing the Euclidean distance

Steps 2 and 3 are repeated in an iterative process, until a criterion is met. The criterion can be as follows:

• The number of iterations defined.

• The data points do not change from cluster...