Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Machine Learning Fundamentals
  • Table Of Contents Toc
Machine Learning Fundamentals

Machine Learning Fundamentals

By : Hyatt Saleh
5 (3)
close
close
Machine Learning Fundamentals

Machine Learning Fundamentals

5 (3)
By: Hyatt Saleh

Overview of this book

As machine learning algorithms become popular, new tools that optimize these algorithms are also developed. Machine Learning Fundamentals explains you how to use the syntax of scikit-learn. You'll study the difference between supervised and unsupervised models, as well as the importance of choosing the appropriate algorithm for each dataset. You'll apply unsupervised clustering algorithms over real-world datasets, to discover patterns and profiles, and explore the process to solve an unsupervised machine learning problem. The focus of the book then shifts to supervised learning algorithms. You'll learn to implement different supervised algorithms and develop neural network structures using the scikit-learn package. You'll also learn how to perform coherent result analysis to improve the performance of the algorithm by tuning hyperparameters. By the end of this book, you will have gain all the skills required to start programming machine learning algorithms.
Table of Contents (8 chapters)
close
close

Scikit-Learn API

The objective of the scikit-learn API is to provide an efficient and unified syntax to make machine learning accessible to non-machine learning experts, as well as to facilitate and popularize its use among several industries.

How Does It Work?

Although it has many collaborators, the scikit-learn API was built and has been updated by considering a set of principles that prevent framework code proliferation, where different codes perform similar functionalities. On the contrary, it promotes simple conventions and consistency. Due to this, the scikit-learn API is consistent among all models, and once the main functionalities have been learned, it can be widely used.

The scikit-learn API is divided into three complementary interfaces that share a common syntax and logic: the estimator, the predictor, and the transformer. The estimator interface is used for creating models and fitting the data into them; the predictor, as the name suggests, is used to make predictions based on the models trained before; and finally, the transformer is used for converting data.

Estimator

This is considered to be the core of the entire API, as it is the interface in charge of fitting the models to the input data. It works by initializing the model to be used, and then applying a fit() method that triggers the learning process to build a model based on the data.

The fit() method receives as arguments the training data, in two separate variables, the features matrix, and the target matrix (conventionally called X_train and Y_train). For unsupervised models, the method only takes in the first argument (X_train).

This method creates the model trained to the input data, which can later be used for predicting.

Some models take other arguments besides the training data, which are also called hyperparameters. These hyperparameters are initially set to their default values, but can be tuned to improve the performance of the model, which will be discussed in further sections.

The following is an example of a model being trained:

from sklearn.naive_bayes import GaussianNB
model = GaussianNB()
model.fit(X_train, Y_train)

First, it is required that you import the type of algorithm to be used from scikit-learn, for example, a Gaussian Naïve Bayes algorithm for classification. It is always a good practice to import only the algorithm to be used, and not the entire library, as this will ensure that your code runs faster.

Note

To find out the syntax to import a different model, use the documentation of scikit-learn. Go to the following link, click over the algorithm that you wish to implement, and you will find the instructions there: http://scikit-learn.org/stable/user_guide.html.

The second line of code oversees the initialization of the model and stores it in a variable. Lastly, the model is fit to the input data.

In addition to this, the estimator also offers other complementary tasks, as follows:

  • Feature extraction, which involves transforming input data into numerical features that can be used for machine learning purposes
  • Feature selection, which selects the features in your data that most contribute to the prediction output of the model
  • Dimensionality reduction, which takes higher-dimensional data and converts it into a lower dimension

Predictor

As explained previously, the predictor takes the model created by the estimator and extends it to perform predictions on unseen data. In general terms, for supervised models, it feeds the model a new set of data, usually called X_test, to get a corresponding target or label based on the parameters learned during the training of the model.

Moreover, some unsupervised models can also benefit from the predictor. While this method does not output a specific target value, it can be useful to assign a new instance to a cluster.

Following the preceding example, the implementation of the predictor can be seen as follows:

Y_pred = model.predict(X_test)

We apply the predict() method to the previously trained model, and input the new data as an argument to the method.

In addition to predicting, the predictor can also implement methods that are in charge of quantifying the confidence of the prediction, also called the performance of the model. These confidence functions vary from model to model, but their main objective is to determine how far the prediction is from reality. This is done by taking an X_test with its corresponding Y_test and comparing it to the predictions made with the same X_test.

Transformer

As we saw previously, data is usually transformed before being fed to a model. Considering this, the API contains a transform() method that allows you to perform some preprocessing techniques.

It can be used both as a starting point to transform the input data of the model (X_train), as well as further along to modify data that will be fed to the model for predictions. This latter application is crucial to get accurate results, as it ensures that the new data follows the same distribution as the data used to train the model.

The following is an example of a transformer that normalizes the values of the training data:

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)

As you can see, after importing and initializing the transformer, it needs to be fit to the data to then effectively transform it:

X_test = scaler.transform(X_test)

The advantage of the transformer is that once it has been applied to the training dataset, it stores the values used for transforming the training data; this can be used to transform the test dataset to the same distribution.

In conclusion, we discussed one of the main benefits of using scikit-learn, which is its API. This API follows a consistent structure that makes it easy for non-experts to apply machine learning algorithms.

To model an algorithm on scikit-learn, the first step is to initialize the model class and fit it to the input data using an estimator, which is usually done by calling the fit() method of the class. Finally, once the model has been trained, it is possible to predict new values using the predictor by calling the predict() method of the class.

Additionally, scikit-learn also has a transformer interface that allows you to transform data as needed. This is useful for performing preprocessing methods over the training data, which can then be also used to transform the testing data to follow the same distribution.

CONTINUE READING
83
Tech Concepts
36
Programming languages
73
Tech Tools
Icon Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.
Icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Icon 50+ new titles added per month and exclusive early access to books as they are being written.
Machine Learning Fundamentals
notes
bookmark Notes and Bookmarks search Search in title playlist Add to playlist font-size Font size

Change the font size

margin-width Margin width

Change margin width

day-mode Day/Sepia/Night Modes

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY

Submit Your Feedback

Modal Close icon
Modal Close icon
Modal Close icon