Python Machine Learning

Python Machine Learning

By : Sebastian Raschka

Buy this Book

Python Machine Learning

By: Sebastian Raschka

Buy this Book

Overview of this book

Machine learning and predictive analytics are transforming the way businesses and other organizations operate. Being able to understand trends and patterns in complex data is critical to success, becoming one of the key strategies for unlocking growth in a challenging contemporary marketplace. Python can help you deliver key insights into your data – its unique capabilities as a language let you build sophisticated algorithms and statistical models that can reveal new perspectives and answer key questions that are vital for success. Python Machine Learning gives you access to the world of predictive analytics and demonstrates why Python is one of the world’s leading data science languages. If you want to ask better questions of data, or need to improve and extend the capabilities of your machine learning systems, this practical data science book is invaluable. Covering a wide range of powerful Python libraries, including scikit-learn, Theano, and Keras, and featuring guidance and tips on everything from sentiment analysis to neural networks, you’ll soon be able to answer some of the most important questions facing you and your organization.

Python Machine Learning

Credits

Foreword

About the Author

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

Giving Computers the Ability to Learn from Data

Building intelligent machines to transform data into knowledge

The three different types of machine learning

An introduction to the basic terminology and notations

A roadmap for building machine learning systems

Using Python for machine learning

Summary

Training Machine Learning Algorithms for Classification

Artificial neurons – a brief glimpse into the early history of machine learning

Implementing a perceptron learning algorithm in Python

Adaptive linear neurons and the convergence of learning

Summary

A Tour of Machine Learning Classifiers Using Scikit-learn

Choosing a classification algorithm

First steps with scikit-learn

Modeling class probabilities via logistic regression

Maximum margin classification with support vector machines

Solving nonlinear problems using a kernel SVM

Decision tree learning

K-nearest neighbors – a lazy learning algorithm

Summary

Building Good Training Sets – Data Preprocessing

Dealing with missing data

Handling categorical data

Partitioning a dataset in training and test sets

Bringing features onto the same scale

Selecting meaningful features

Assessing feature importance with random forests

Summary

Compressing Data via Dimensionality Reduction

Unsupervised dimensionality reduction via principal component analysis

Supervised data compression via linear discriminant analysis

Using kernel principal component analysis for nonlinear mappings

Summary

Learning Best Practices for Model Evaluation and Hyperparameter Tuning

Streamlining workflows with pipelines

Using k-fold cross-validation to assess model performance

Debugging algorithms with learning and validation curves

Fine-tuning machine learning models via grid search

Looking at different performance evaluation metrics

Summary

Combining Different Models for Ensemble Learning

Learning with ensembles

Implementing a simple majority vote classifier

Evaluating and tuning the ensemble classifier

Bagging – building an ensemble of classifiers from bootstrap samples

Leveraging weak learners via adaptive boosting

Summary

Applying Machine Learning to Sentiment Analysis

Obtaining the IMDb movie review dataset

Introducing the bag-of-words model

Training a logistic regression model for document classification

Working with bigger data – online algorithms and out-of-core learning

Summary

Embedding a Machine Learning Model into a Web Application

Serializing fitted scikit-learn estimators

Setting up a SQLite database for data storage

Developing a web application with Flask

Turning the movie classifier into a web application

Deploying the web application to a public server

Summary

Predicting Continuous Target Variables with Regression Analysis

Introducing a simple linear regression model

Exploring the Housing Dataset

Implementing an ordinary least squares linear regression model

Fitting a robust regression model using RANSAC

Evaluating the performance of linear regression models

Using regularized methods for regression

Turning a linear regression model into a curve – polynomial regression

Summary

Working with Unlabeled Data – Clustering Analysis

Grouping objects by similarity using k-means

Organizing clusters as a hierarchical tree

Locating regions of high density via DBSCAN

Summary

Training Artificial Neural Networks for Image Recognition

Modeling complex functions with artificial neural networks

Classifying handwritten digits

Training an artificial neural network

Developing your intuition for backpropagation

Debugging neural networks with gradient checking

Convergence in neural networks

Other neural network architectures

A few last words about neural network implementation

Summary

Parallelizing Neural Network Training with Theano

Building, compiling, and running expressions with Theano

Choosing activation functions for feedforward neural networks

Training neural networks efficiently using Keras

Summary

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Assessing feature importance with random forests

In the previous sections, you learned how to use L1 regularization to zero out irrelevant features via logistic regression and use the SBS algorithm for feature selection. Another useful approach to select relevant features from a dataset is to use a random forest, an ensemble technique that we introduced in Chapter 3, A Tour of Machine Learning Classifiers Using Scikit-learn. Using a random forest, we can measure feature importance as the averaged impurity decrease computed from all decision trees in the forest without making any assumptions whether our data is linearly separable or not. Conveniently, the random forest implementation in scikit-learn already collects feature importances for us so that we can access them via the feature_importances_ attribute after fitting a RandomForestClassifier. By executing the following code, we will now train a forest of 10,000 trees on the Wine dataset and rank the 13 features by their respective importance...

Python Machine Learning

By : Sebastian Raschka

Python Machine Learning

By: Sebastian Raschka

Overview of this book

Related Content you might be interested in

Current Title:

Python Machine Learning

Assessing feature importance with random forests