Python Machine Learning Blueprints: Intuitive data projects you can relate to

Python Machine Learning Blueprints: Intuitive data projects you can relate to

By : Alexander T. Combs

Buy this Book

Python Machine Learning Blueprints: Intuitive data projects you can relate to

By: Alexander T. Combs

Buy this Book

Overview of this book

Machine Learning is transforming the way we understand and interact with the world around us. But how much do you really understand it? How confident are you interacting with the tools and models that drive it? Python Machine Learning Blueprints puts your skills and knowledge to the test, guiding you through the development of some awesome machine learning applications and algorithms with real-world examples that demonstrate how to put concepts into practice. You’ll learn how to use cluster techniques to discover bargain air fares, and apply linear regression to find yourself a cheap apartment – and much more. Everything you learn is backed by a real-world example, whether its data manipulation or statistical modelling. That way you’re never left floundering in theory – you’ll be simply collecting and analyzing data in a way that makes a real impact.

Python Machine Learning Blueprints

Credits

About the Author

About the Reviewer

www.PacktPub.com

Preface

Free Chapter

The Python Machine Learning Ecosystem

The data science/machine learning workflow

Python libraries and functions

Setting up your machine learning environment

Summary

Build an App to Find Underpriced Apartments

Sourcing the apartment listing data

Inspecting and preparing the data

Modeling the data

Summary

Build an App to Find Cheap Airfares

Sourcing airfare pricing data

Retrieving the fare data with advanced web scraping techniques

Parsing the DOM to extract pricing data

Sending real-time alerts using IFTTT

Putting it all together

Summary

Forecast the IPO Market using Logistic Regression

The IPO market

Feature engineering

Binary classification

Feature importance

Summary

Create a Custom Newsfeed

Creating a supervised training set with the Pocket app

Using the embed.ly API to download story bodies

Natural language processing basics

Support vector machines

IFTTT integration with feeds, Google Sheets, and e-mail

Setting up your daily personal newsletter

Summary

Predict whether Your Content Will Go Viral

What does research tell us about virality?

Sourcing shared counts and content

Exploring the features of shareability

Building a predictive content scoring model

Summary

Forecast the Stock Market with Machine Learning

Types of market analysis

What does research tell us about the stock market?

How to develop a trading strategy

Summary

Build an Image Similarity Engine

Machine learning on images

Working with images

Finding similar images

Understanding deep learning

Building an image similarity engine

Summary

Build a Chatbot

The Turing test

The history of chatbots

The design of chatbots

Building a chatbot

Summary

Build a Recommendation Engine

Collaborative filtering

Content-based filtering

Hybrid systems

Building a recommendation engine

Summary

Customer Reviews

5 star

4 star

3 star

2 star

1 star

The data science/machine learning workflow

Building machine learning applications, while similar in many respects to the standard engineering paradigm, differs in one crucial way: the need to work with data as a raw material. The success of a data project will, in large part, depend on the quality of the data that you acquired as well as how it's handled. And because working with data falls into the domain of data science, it is helpful to understand the data science workflow:

The process proceeds through these six steps in the following order: acquisition, inspection and exploration, cleaning and preparation, modeling, evaluation, and finally deployment. There is often the need to circle back to prior steps, such as when inspecting and preparing the data or when evaluating and modeling, but the process at a high level can be described as shown in the preceding diagram.

Let's now discuss each step in detail.

Acquisition

Data for machine learning applications can come from any number of sources; it may be e-mailed as a CSV file, it may come from pulling down server logs, or it may require building a custom web scraper. The data may also come in any number of formats. In most cases, it will be text-based data, but as we'll see, machine learning applications may just as easily be built utilizing images or even video files. Regardless of the format, once the data is secured, it is crucial to understand what's in the data—as well as what isn't.

Inspection and exploration

Once the data has been acquired, the next step is to inspect and explore it. At this stage, the primary goal is to sanity-check the data, and the best way to accomplish this is to look for things that are either impossible or highly unlikely. As an example, if the data has a unique identifier, check to see that there is indeed only one; if the data is price-based, check whether it is always positive; and whatever the data type, check the most extreme cases. Do they make sense? A good practice is to run some simple statistical tests on the data and visualize it. Additionally, it is likely that some data is missing or incomplete. It is critical to take note of this during this stage as it will need to be addressed it later during the cleaning and preparation stage. Models are only as good as the data that goes into them, so it is crucial to get this step right.

Cleaning and preparation

When all the data is in order, the next step is to place it in a format that is amenable to modeling. This stage encompasses a number of processes such as filtering, aggregating, imputing, and transforming. The type of actions that are necessary will be highly dependent on the type of data as well as the type of library and algorithm utilized. For example, with natural-language-based text, the transformations required will be very different from those required for time series data. We'll see a number of examples of these types of transformations throughout the book.

Modeling

Once the data preparation is complete, the next phase is modeling. In this phase, an appropriate algorithm is selected and a model is trained on the data. There are a number of best practices to adhere to during this stage, and we will discuss them in detail, but the basic steps involve splitting the data into training, testing, and validation sets. This splitting up of the data may seem illogical—especially when more data typically yields better models—but as we'll see, doing this allows us to get better feedback on how the model will perform in the real world, and prevents us from the cardinal sin of modeling: overfitting.

Evaluation

Once the model is built and making predictions, the next step is to understand how well the model does that. This is the question that evaluation seeks to answer. There are a number of ways to measure the performance of a model, and again it is largely dependent on the type of data and the model used, but on the whole, we are seeking to answer the question of how close are the model's predictions to the actual value. There are arrays of confusing-sounding terms such as root mean-square error, Euclidean distance, and F1 score, but in the end, they are all just a measure of distance between the actual value and the estimated prediction.

Deployment

Once the model's performance is satisfactory, the next step is deployment. This can take a number of forms depending on the use case, but common scenarios include utilization as a feature within another larger application, a bespoke web application, or even just a simple cron job.

Python Machine Learning Blueprints: Intuitive data projects you can relate to

By : Alexander T. Combs

Python Machine Learning Blueprints: Intuitive data projects you can relate to

By: Alexander T. Combs

Overview of this book

Related Content you might be interested in

Current Title:

Python Machine Learning Blueprints: Intuitive data projects you can relate to

The data science/machine learning workflow

Acquisition

Inspection and exploration

Cleaning and preparation

Modeling

Evaluation

Deployment