Book Image

Python Machine Learning By Example. - Second Edition

By : Yuxi (Hayden) Liu

Book Image

Python Machine Learning By Example. - Second Edition

By: Yuxi (Hayden) Liu

Overview of this book

The surge in interest in machine learning (ML) is due to the fact that it revolutionizes automation by learning patterns in data and using them to make predictions and decisions. If you’re interested in ML, this book will serve as your entry point to ML. Python Machine Learning By Example begins with an introduction to important ML concepts and implementations using Python libraries. Each chapter of the book walks you through an industry adopted application. You’ll implement ML techniques in areas such as exploratory data analysis, feature engineering, and natural language processing (NLP) in a clear and easy-to-follow way. With the help of this extended and updated edition, you’ll understand how to tackle data-driven problems and implement your solutions with the powerful yet simple Python language and popular Python packages and tools such as TensorFlow, scikit-learn, gensim, and Keras. To aid your understanding of popular ML algorithms, the book covers interesting and easy-to-follow examples such as news topic modeling and classification, spam email detection, stock price forecasting, and more. By the end of the book, you’ll have put together a broad picture of the ML ecosystem and will be well-versed with the best practices of applying ML techniques to make the most out of new opportunities.

Preface

Who this book is for

What this book covers

To get the most out of this book

Free Chapter

Section 1: Fundamentals of Machine Learning

Section 1: Fundamentals of Machine Learning

Getting Started with Machine Learning and Python

Getting Started with Machine Learning and Python

Defining machine learning and why we need it

A very high-level overview of machine learning technology

Core of machine learning – generalizing with data

Preprocessing, exploration, and feature engineering

Combining models

Installing software and setting up

Section 2: Practical Python Machine Learning By Example

Section 2: Practical Python Machine Learning By Example

Exploring the 20 Newsgroups Dataset with Text Analysis Techniques

Exploring the 20 Newsgroups Dataset with Text Analysis Techniques

How computers understand language - NLP

Picking up NLP basics while touring popular NLP libraries

Getting the newsgroups data

Exploring the newsgroups data

Thinking about features for text data

Visualizing the newsgroups data with t-SNE

Mining the 20 Newsgroups Dataset with Clustering and Topic Modeling Algorithms

Mining the 20 Newsgroups Dataset with Clustering and Topic Modeling Algorithms

Learning without guidance – unsupervised learning

Clustering newsgroups data using k-means

Discovering underlying topics in newsgroups

Topic modeling using NMF

Topic modeling using LDA

Detecting Spam Email with Naive Bayes

Detecting Spam Email with Naive Bayes

Getting started with classification

Exploring Naïve Bayes

Classification performance evaluation

Model tuning and cross-validation

Classifying Newsgroup Topics with Support Vector Machines

Classifying Newsgroup Topics with Support Vector Machines

Finding separating boundary with support vector machines

Classifying newsgroup topics with SVMs

More example – fetal state classification on cardiotocography

A further example – breast cancer classification using SVM with TensorFlow

Predicting Online Ad Click-Through with Tree-Based Algorithms

Predicting Online Ad Click-Through with Tree-Based Algorithms

Brief overview of advertising click-through prediction

Getting started with two types of data – numerical and categorical

Exploring decision tree from root to leaves

Implementing a decision tree from scratch

Predicting ad click-through with decision tree

Ensembling decision trees – random forest

Predicting Online Ad Click-Through with Logistic Regression

Predicting Online Ad Click-Through with Logistic Regression

Converting categorical features to numerical – one-hot encoding and ordinal encoding

Classifying data with logistic regression

Training a logistic regression model

Training on large datasets with online learning

Handling multiclass classification

Implementing logistic regression using TensorFlow

Feature selection using random forest

Scaling Up Prediction to Terabyte Click Logs

Scaling Up Prediction to Terabyte Click Logs

Learning the essentials of Apache Spark

Programming in PySpark

Learning on massive click logs with Spark

Feature engineering on categorical variables with Spark

Stock Price Prediction with Regression Algorithms

Stock Price Prediction with Regression Algorithms

Brief overview of the stock market and stock prices

What is regression?

Mining stock price data

Estimating with linear regression

Estimating with decision tree regression

Estimating with support vector regression

Estimating with neural networks

Evaluating regression performance

Predicting stock price with four regression algorithms

Section 3: Python Machine Learning Best Practices

Section 3: Python Machine Learning Best Practices

Machine Learning Best Practices

Machine Learning Best Practices

Machine learning solution workflow

Best practices in the data preparation stage

Best practices in the training sets generation stage

Best practices in the model training, evaluation, and selection stage

Best practices in the deployment and monitoring stage

Other Books You May Enjoy

Other Books You May Enjoy

Leave a review - let other readers know what you think

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Programming in PySpark

This section provides a quick introduction to programming with Python in Spark. We will start with the basic data structures in Spark.

Resilient Distributed Datasets (RDD) is the primary data structure in Spark. It is a distributed collection of objects and has the following three main features:

Resilient: When any node fails, affected partitions will be reassigned to healthy nodes, which makes Spark fault-tolerant
Distributed: Data resides on one or more nodes in a cluster, which can be operated on in parallel
Dataset: This contains a collection of partitioned data with their values or metadata

RDD was the main data structure in Spark before version 2.0. After that, it is replaced by the DataFrame , which is also a distributed collection of data but organized into named columns. DataFrame utilizes the optimized execution engine of Spark SQL. Therefore...