Chapter 11: Validating Machine Learning | Machine Learning For Dummies

Book Overview & Buying
Table Of Contents

Machine Learning For Dummies

By : John Paul Mueller, Luca Massaron

Buy this Book

Machine Learning For Dummies

By: John Paul Mueller, Luca Massaron

Buy this Book

Overview of this book

Machine learning can be a mind-boggling concept for the masses, but those who are in the trenches of computer programming know just how invaluable it is. Without machine learning, fraud detection, web search results, real-time ads on web pages, credit scoring, automation, and email spam filtering wouldn’t be possible, and this is only showcasing just a few of its capabilities. Written by two data science experts, Machine Learning For Dummies offers a much-needed entry point for anyone looking to use machine learning to accomplish practical tasks. In the initial chapters, the book introduces you to the world of machine learning, artificial intelligence, big data, and will prepare you to use R and Python for machine learning tasks. Next, you’ll learn how to use math in machine learning and get started with linear models and neural networks. In the final chapters, you’ll process images and text, and discover packages and techniques to improve your machine learning models. By the end of this book, you’ll be able to understand and implement machine learning seamlessly.

Free Chapter

Introduction

About This Book

Foolish Assumptions

Icons Used in This Book

Beyond the Book

Where to Go from Here

Part 1: Introducing How Machines Learn

Chapter 1: Getting the Real Story about AI

Moving beyond the Hype

Dreaming of Electric Sheep

Overcoming AI Fantasies

Considering the Relationship between AI and Machine Learning

Considering AI and Machine Learning Specifications

Defining the Divide between Art and Engineering

Chapter 2: Learning in the Age of Big Data

Defining Big Data

Considering the Sources of Big Data

Specifying the Role of Statistics in Machine Learning

Understanding the Role of Algorithms

Defining What Training Means

Chapter 3: Having a Glance at the Future

Creating Useful Technologies for the Future

Discovering the New Work Opportunities with Machine Learning

Avoiding the Potential Pitfalls of Future Technologies

Part 2: Preparing Your Learning Tools

Chapter 4: Installing an R Distribution

Choosing an R Distribution with Machine Learning in Mind

Installing R on Windows

Installing R on Linux

Installing R on Mac OS X

Downloading the Datasets and Example Code

Chapter 5: Coding in R Using RStudio

Understanding the Basic Data Types

Working with Vectors

Organizing Data Using Lists

Working with Matrices

Interacting with Multiple Dimensions Using Arrays

Creating a Data Frame

Performing Basic Statistical Tasks

Chapter 6: Installing a Python Distribution

Choosing a Python Distribution with Machine Learning in Mind

Installing Python on Linux

Installing Python on Mac OS X

Installing Python on Windows

Downloading the Datasets and Example Code

Chapter 7: Coding in Python Using Anaconda

Working with Numbers and Logic

Creating and Using Strings

Interacting with Dates

Creating and Using Functions

Using Conditional and Loop Statements

Storing Data Using Sets, Lists, and Tuples

Defining Useful Iterators

Indexing Data Using Dictionaries

Storing Code in Modules

Chapter 8: Exploring Other Machine Learning Tools

Meeting the Precursors SAS, Stata, and SPSS

Learning in Academia with Weka

Accessing Complex Algorithms Easily Using LIBSVM

Running As Fast As Light with Vowpal Wabbit

Visualizing with Knime and RapidMiner

Dealing with Massive Data by Using Spark

Part 3: Getting Started with the Math Basics

Chapter 9: Demystifying the Math Behind Machine Learning

Working with Data

Exploring the World of Probabilities

Describing the Use of Statistics

Chapter 10: Descending the Right Curve

Interpreting Learning As Optimization

Exploring Cost Functions

Descending the Error Curve

Updating by Mini-Batch and Online

Chapter 11: Validating Machine Learning

Checking Out-of-Sample Errors

Getting to Know the Limits of Bias

Keeping Model Complexity in Mind

Keeping Solutions Balanced

Training, Validating, and Testing

Resorting to Cross-Validation

Looking for Alternatives in Validation

Optimizing Cross-Validation Choices

Avoiding Sample Bias and Leakage Traps

Chapter 12: Starting with Simple Learners

Discovering the Incredible Perceptron

Growing Greedy Classification Trees

Taking a Probabilistic Turn

Part 4: Learning from Smart and Big Data

Chapter 13: Preprocessing Data

Gathering and Cleaning Data

Repairing Missing Data

Transforming Distributions

Creating Your Own Features

Compressing Data

Delimiting Anomalous Data

Chapter 14: Leveraging Similarity

Measuring Similarity between Vectors

Using Distances to Locate Clusters

Tuning the K-Means Algorithm

Searching for Classification by K-Nearest Neighbors

Leveraging the Correct K Parameter

Chapter 15: Working with Linear Models the Easy Way

Starting to Combine Variables

Mixing Variables of Different Types

Switching to Probabilities

Guessing the Right Features

Learning One Example at a Time

Chapter 16: Hitting Complexity with Neural Networks

Learning and Imitating from Nature

Struggling with Overfitting

Introducing Deep Learning

Chapter 17: Going a Step beyond Using Support Vector Machines

Revisiting the Separation Problem: A New Approach

Explaining the Algorithm

Applying Nonlinearity

Illustrating Hyper-Parameters

Classifying and Estimating with SVM

Chapter 18: Resorting to Ensembles of Learners

Leveraging Decision Trees

Working with Almost Random Guesses

Boosting Smart Predictors

Averaging Different Predictors

Part 5: Applying Learning to Real Problems

Chapter 19: Classifying Images

Working with a Set of Images

Extracting Visual Features

Recognizing Faces Using Eigenfaces

Classifying Images

Chapter 20: Scoring Opinions and Sentiments

Introducing Natural Language Processing

Understanding How Machines Read

Using Scoring and Classification

Chapter 21: Recommending Products and Movies

Realizing the Revolution

Downloading Rating Data

Leveraging SVD

Part 6: The Part of Tens

Chapter 22: Ten Machine Learning Packages to Master

Cloudera Oryx

CUDA-Convnet

ConvNetJS

e1071

gbm

Gensim

glmnet

randomForest

SciPy

XGBoost

Chapter 23: Ten Ways to Improve Your Machine Learning Models

Studying Learning Curves

Using Cross-Validation Correctly

Choosing the Right Error or Score Metric

Searching for the Best Hyper-Parameters

Testing Multiple Models

Averaging Models

Stacking Models

Applying Feature Engineering

Selecting Features and Examples

Looking for More Data

About the Author

Advertisement Page

Connect with Dummies

End User License Agreement

Machine Learning For Dummies

By : John Paul Mueller, Luca Massaron

Machine Learning For Dummies

By: John Paul Mueller, Luca Massaron

Overview of this book

Training, Validating, and Testing

Confirmation

Buy this book with your credits?

Submit Your Feedback

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access