Book Image

Python Machine Learning Cookbook

By : Prateek Joshi, Vahid Mirjalili
Book Image

Python Machine Learning Cookbook

By: Prateek Joshi, Vahid Mirjalili

Overview of this book

Machine learning is becoming increasingly pervasive in the modern data-driven world. It is used extensively across many fields such as search engines, robotics, self-driving cars, and more. With this book, you will learn how to perform various machine learning tasks in different environments. We’ll start by exploring a range of real-life scenarios where machine learning can be used, and look at various building blocks. Throughout the book, you’ll use a wide variety of machine learning algorithms to solve real-world problems and use Python to implement these algorithms. You’ll discover how to deal with various types of data and explore the differences between machine learning paradigms such as supervised and unsupervised learning. We also cover a range of regression techniques, classification algorithms, predictive modeling, data visualization techniques, recommendation engines, and more with the help of real-world examples.
Table of Contents (19 chapters)
Python Machine Learning Cookbook
About the Author
About the Reviewer

Building a polynomial regressor

One of the main constraints of a linear regression model is the fact that it tries to fit a linear function to the input data. The polynomial regression model overcomes this issue by allowing the function to be a polynomial, thereby increasing the accuracy of the model.

Getting ready

Let's consider the following figure:

We can see that there is a natural curve to the pattern of datapoints. This linear model is unable to capture this. Let's see what a polynomial model would look like:

The dotted line represents the linear regression model, and the solid line represents the polynomial regression model. The curviness of this model is controlled by the degree of the polynomial. As the curviness of the model increases, it gets more accurate. However, curviness adds complexity to the model as well, hence, making it slower. This is a trade off where you have to decide between how accurate you want your model to be given the computational constraints.

How to do it…

  1. Add the following lines to

    from sklearn.preprocessing import PolynomialFeatures
    polynomial = PolynomialFeatures(degree=3)
  2. We initialized a polynomial of the degree 3 in the previous line. Now we have to represent the datapoints in terms of the coefficients of the polynomial:

    X_train_transformed = polynomial.fit_transform(X_train)

    Here, X_train_transformed represents the same input in the polynomial form.

  3. Let's consider the first datapoint in our file and check whether it can predict the right output:

    datapoint = [0.39,2.78,7.11]
    poly_datapoint = polynomial.fit_transform(datapoint)
    poly_linear_model = linear_model.LinearRegression(), y_train)
    print "\nLinear regression:", linear_regressor.predict(datapoint)[0]
    print "\nPolynomial regression:", poly_linear_model.predict(poly_datapoint)[0]

    The values in the variable datapoint are the values in the first line in the input data file. We are still fitting a linear regression model here. The only difference is in the way in which we represent the data. If you run this code, you will see the following output:

    Linear regression: -11.0587294983
    Polynomial regression: -10.9480782122

    As you can see, this is close to the output value. If we want it to get closer, we need to increase the degree of the polynomial.

  4. Let's make it 10 and see what happens:

    polynomial = PolynomialFeatures(degree=10)

    You should see something like the following:

    Polynomial regression: -8.20472183853

Now, you can see that the predicted value is much closer to the actual output value.