Book Image

Mastering Machine Learning with scikit-learn

By : Gavin Hackeling
Book Image

Mastering Machine Learning with scikit-learn

By: Gavin Hackeling

Overview of this book

<p>This book examines machine learning models including logistic regression, decision trees, and support vector machines, and applies them to common problems such as categorizing documents and classifying images. It begins with the fundamentals of machine learning, introducing you to the supervised-unsupervised spectrum, the uses of training and test data, and evaluating models. You will learn how to use generalized linear models in regression problems, as well as solve problems with text and categorical features.</p> <p>You will be acquainted with the use of logistic regression, regularization, and the various loss functions that are used by generalized linear models. The book will also walk you through an example project that prompts you to label the most uncertain training examples. You will also use an unsupervised Hidden Markov Model to predict stock prices.</p> <p>By the end of the book, you will be an expert in scikit-learn and will be well versed in machine learning.</p>
Table of Contents (17 chapters)
Mastering Machine Learning with scikit-learn
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Performing Principal Component Analysis


There are several terms that we must define before discussing how principal component analysis works.

Variance, Covariance, and Covariance Matrices

Recall that variance is a measure of how a set of values are spread out. Variance is calculated as the average of the squared differences of the values and mean of the values, as per the following equation:

Covariance is a measure of how much two variables change together; it is a measure of the strength of the correlation between two sets of variables. If the covariance of two variables is zero, the variables are uncorrelated. Note that uncorrelated variables are not necessarily independent, as correlation is only a measure of linear dependence. The covariance of two variables is calculated using the following equation:

If the covariance is nonzero, the sign indicates whether the variables are positively or negatively correlated. When two variables are positively correlated, one increases as the other increases...