This includes algorithms for the most common machine learning tasks, such as classification, regression, clustering, dimensionality reduction, model selection, and preprocessing.
Scikit-learn comes with several real-world data sets for us to practice with. Let's take a look at one of these—the Iris data set:
from sklearn import datasets iris = datasets.load_iris() iris_X = iris.data iris_y = iris.target iris_X.shape (150, 4)
The data set contains 150 samples of three types of irises (Setosa, Versicolor, and Virginica), each with four features. We can get a description on the dataset:
iris.DESCR
We can see that the four attributes, or features, are sepal width, sepal length, petal length, and petal width in centimeters. Each sample is associated with one of three classes. Setosa, Versicolor, and Virginica. These are represented by 0, 1, and 2 respectively.
Let's look at a simple classification problem using this data. We want to predict the type of iris based on its features: the...