Book Image

scikit-learn Cookbook

By : Trent Hauck
Book Image

scikit-learn Cookbook

By: Trent Hauck

Overview of this book

<p>Python is quickly becoming the go-to language for analysts and data scientists due to its simplicity and flexibility, and within the Python data space, scikit-learn is the unequivocal choice for machine learning. Its consistent API and plethora of features help solve any machine learning problem it comes across.</p> <p>The book starts by walking through different methods to prepare your data—be it a dataset with missing values or text columns that require the categories to be turned into indicator variables. After the data is ready, you'll learn different techniques aligned with different objectives—be it a dataset with known outcomes such as sales by state, or more complicated problems such as clustering similar customers. Finally, you'll learn how to polish your algorithm to ensure that it's both accurate and resilient to new datasets.</p>
Table of Contents (12 chapters)
scikit-learn Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Fitting a line through data


Now, we get to do some modeling! It's best to start simple; therefore, we'll look at linear regression first. Linear regression is the first, and therefore, probably the most fundamental model—a straight line through data.

Getting ready

The boston dataset is perfect to play around with regression. The boston dataset has the median home price of several areas in Boston. It also has other factors that might impact housing prices, for example, crime rate.

First, import the datasets model, then we can load the dataset:

>>> from sklearn import datasets
>>> boston = datasets.load_boston()

How to do it...

Actually, using linear regression in scikit-learn is quite simple. The API for linear regression is basically the same API you're now familiar with from the previous chapter.

First, import the LinearRegression object and create an object:

>>> from sklearn.linear_model import LinearRegression
>>> lr = LinearRegression()

Now, it's as easy...