-
Book Overview & Buying
-
Table Of Contents
Machine Learning Fundamentals
By :
Created in 2007 by David Cournapeau as part of a Google Summer of Code project, scikit-learn is an open source Python library made to facilitate the process of building models based on built-in machine learning and statistical algorithms, without the need for hard-coding. The main reasons for its popular use are its complete documentation, its easy-to-use API, and the many collaborators who work every day to improve the library.
You can find the documentation for scikit-learn at the following link: http://scikit-learn.org.
Scikit-learn is mainly used to model data, and not as much to manipulate or summarize data. It offers its users an easy-to-use, uniform API to apply different models, with little learning effort, and no real knowledge of the math behind it, required.
Some of the math topics that you need to know about to understand the models are linear algebra, probability theory, and multivariate calculus. For more information on these models, visit: https://towardsdatascience.com/the-mathematics-of-machine-learning-894f046c568.
The models available under the scikit-learn library fall into two categories: supervised and unsupervised, both of which will be explained in depth in later sections. This category classification will help to determine which model to use for a particular dataset to get the most information out of it.
Besides its main use for interpreting data to train models, scikit-learn is also used to do the following:
Although scikit-learn is considered the preferred Python library for beginners in the world of machine learning, there are several large companies around the world using it, as it allows them to improve their product or services by applying the models to already existing developments. It also permits them to quickly implement tests over new ideas.
You can visit the following website to find out which companies are using scikit-learn and what are they using it for: http://scikit-learn.org/stable/testimonials/testimonials.html.
In conclusion, scikit-learn is an open source Python library that uses an API to apply most machine learning tasks (both supervised or unsupervised) to data problems. Its main use is for modeling data; nevertheless, it should not be limited to that, as the library also allows users to predict outcomes based on the model being trained, as well as to analyze the performance of the model.
The following is a list of the main advantages of using scikit-learn for machine learning purposes:
The following is a list of the main disadvantages of using scikit-learn for machine learning purposes:
In general terms, scikit-learn is an excellent beginner's library as it requires little effort to learn its use and has many complementary materials thought to facilitate its application. Due to the contributions of several collaborators, the library stays up to date and is applicable to most current data problems.
On the other hand, it is a fairly simple library, not fit for more complex data problems such as deep learning. Likewise, it is not recommended for users who wish to take its abilities to a higher level by playing with the different parameters that are available in each model.
Change the font size
Change margin width
Change background colour