Building Machine Learning Systems with Python

Building Machine Learning Systems with Python

Overview of this book

Machine learning, the field of building systems that learn from data, is exploding on the Web and elsewhere. Python is a wonderful language in which to develop machine learning applications. As a dynamic language, it allows for fast exploration and experimentation and an increasing number of machine learning libraries are developed for Python.Building Machine Learning system with Python shows you exactly how to find patterns through raw data. The book starts by brushing up on your Python ML knowledge and introducing libraries, and then moves on to more serious projects on datasets, Modelling, Recommendations, improving recommendations through examples and sailing through sound and image processing in detail. Using open-source tools and libraries, readers will learn how to apply methods to text, images, and sounds. You will also learn how to evaluate, compare, and choose machine learning techniques. Written for Python programmers, Building Machine Learning Systems with Python teaches you how to use open-source libraries to solve real problems with machine learning. The book is based on real-world examples that the user can build on. Readers will learn how to write programs that classify the quality of StackOverflow answers or whether a music file is Jazz or Metal. They will learn regression, which is demonstrated on how to recommend movies to users. Advanced topics such as topic modeling (finding a text's most important topics), basket analysis, and cloud computing are covered as well as many other interesting aspects.Building Machine Learning Systems with Python will give you the tools and understanding required to build your own systems, which are tailored to solve your problems.

Building Machine Learning Systems with Python

Credits

About the Authors

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

Getting Started with Python Machine Learning

Machine learning and Python – the dream team

What the book will teach you (and what it will not)

What to do when you are stuck

Getting started

Our first (tiny) machine learning application

Summary

Learning How to Classify with Real-world Examples

The Iris dataset

Building more complex classifiers

A more complex dataset and a more complex classifier

Binary and multiclass classification

Summary

Clustering – Finding Related Posts

Measuring the relatedness of posts

Preprocessing – similarity measured as similar number of common words

Clustering

Solving our initial challenge

Tweaking the parameters

Summary

Topic Modeling

Latent Dirichlet allocation (LDA)

Comparing similarity in topic space

Choosing the number of topics

Summary

Classification – Detecting Poor Answers

Sketching our roadmap

Learning to classify classy answers

Fetching the data

Creating our first classifier

Deciding how to improve

Using logistic regression

Looking behind accuracy – precision and recall

Slimming the classifier

Ship it!

Summary

Classification II – Sentiment Analysis

Sketching our roadmap

Fetching the Twitter data

Introducing the Naive Bayes classifier

Creating our first classifier and tuning it

Cleaning tweets

Taking the word types into account

Summary

Regression – Recommendations

Predicting house prices with regression

Penalized regression

P greater than N scenarios

Summary

Regression – Recommendations Improved

Improved recommendations

Basket analysis

Summary

Classification III – Music Genre Classification

Sketching our roadmap

Fetching the music data

Looking at music

Using FFT to build our first classifier

Improving classification performance with Mel Frequency Cepstral Coefficients

Summary

Computer Vision – Pattern Recognition

Introducing image processing

Loading and displaying images

Classifying a harder dataset

Local feature representations

Summary

Dimensionality Reduction

Sketching our roadmap

Selecting features

Other feature selection methods

Feature extraction

Multidimensional scaling (MDS)

Summary

Big(ger) Data

Learning about big data

Using jug to break up your pipeline into tasks

Using Amazon Web Services (AWS)

Summary

Where to Learn More about Machine Learning

Online courses

Books

What was left out

Summary

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Books

This book focused on the practical side of machine learning. We did not present the thinking behind the algorithms or the theory that justifies them. If you are interested in that aspect of machine learning, then we recommend Pattern Recognition and Machine Learning, C. Bishop , Springer Apply Italics to this. This is a classical introductory text in the field. It will teach you the nitty-gritties of most of the algorithms we used in this book.

If you want to move beyond an introduction and learn all the gory mathematical details, then Machine Learning: A Probabilistic Perspective, K. Murphy, The MIT Press, is an excellent option. It is very recent (published in 2012), and contains the cutting edge of ML research. This 1,100 page book can also serve as a reference, as very little of machine learning has been left out.

Q&A sites

The following are the two Q&A websites of machine learning:

MetaOptimize (http://metaoptimize.com/qa) is a machine learning Q&A website where many very knowledgeable researchers and practitioners interact
Cross Validated (http://stats.stackexchange.com) is a general statistics Q&A site, which often features machine learning questions as well

As mentioned in the beginning of the book, if you have questions specific to particular parts of the book, feel free to ask them at TwoToReal (http://www.twotoreal.com). We try to be as quick as possible to jump in and help as best as we can.

Blogs

The following is an obviously non-exhaustive list of blogs that are interesting to someone working on machine learning:

Machine Learning Theory at http://hunch.net
- This is a blog by John Langford, the brain behind Vowpal Wabbit (http://hunch.net/~vw/), but guest posts also appear.
- The average pace is approximately one post per month. The posts are more theoretical. They also offer additional value in brain teasers.
Text and data mining by practical means at http://textanddatamining.blogspot.de
- The average pace is one per month, which is very practical and has always surprising approaches
http://blog.echen.me
- The average pace is one per month, providing more applied topics
Machined Learnings at http://www.machinedlearnings.com
- The average pace is one per month, providing more applied topics; often revolving around learning big data
FlowingData at http://flowingdata.com
- The average pace is one per day, with the posts revolving more around statistics
Normal deviate at http://normaldeviate.wordpress.com
- The average pace is one per month, covering theoretical discussions of practical problems. Although being more of a statistics blog, the posts often intersect with machine learning.
Simply statistics at http://simplystatistics.org
- There are several posts per month, focusing on statistics and big data
Statistical Modeling, Causal Inference, and Social Science at http://andrewgelman.com
- There is one post per day with often funny reads when the author points out flaws in popular media using statistics

Data sources

If you want to play around with algorithms, you can obtain many datasets from the Machine Learning Repository at University of California at Irvine (UCI). You can find it at http://archive.ics.uci.edu/ml.

Getting competitive

An excellent way to learn more about machine learning is by trying out a competition! Kaggle (http://www.kaggle.com) is a marketplace of ML competitions and has already been mentioned in the introduction. On the website, you will find several different competitions with different structures and often cash prizes.

The supervised learning competitions almost always follow the following format:

You (and every other competitor) are given access to labeled training data and testing data (without labels).
Your task is to submit predictions for the testing data.
When the competition closes, whoever has the best accuracy wins. The prizes range from glory to cash.

Of course, winning something is nice, but you can gain a lot of useful experience just by participating. So, you have to stay tuned, especially after the competition is over and participants start sharing their approaches in the forum. Most of the time, winning is not about developing a new algorithm; it is about cleverly preprocessing, normalizing, and combining the existing methods.

Building Machine Learning Systems with Python

Building Machine Learning Systems with Python

Overview of this book

Related Content you might be interested in

Current Title:

Building Machine Learning Systems with Python

Books

Q&A sites

Blogs

Data sources

Getting competitive