Book Image

Advanced Machine Learning with Python

Book Image

Advanced Machine Learning with Python

Overview of this book

Designed to take you on a guided tour of the most relevant and powerful machine learning techniques in use today by top data scientists, this book is just what you need to push your Python algorithms to maximum potential. Clear examples and detailed code samples demonstrate deep learning techniques, semi-supervised learning, and more - all whilst working with real-world applications that include image, music, text, and financial data. The machine learning techniques covered in this book are at the forefront of commercial practice. They are applicable now for the first time in contexts such as image recognition, NLP and web search, computational creativity, and commercial/financial data modeling. Deep Learning algorithms and ensembles of models are in use by data scientists at top tech and digital companies, but the skills needed to apply them successfully, while in high demand, are still scarce. This book is designed to take the reader on a guided tour of the most relevant and powerful machine learning techniques. Clear descriptions of how techniques work and detailed code examples demonstrate deep learning techniques, semi-supervised learning and more, in real world applications. We will also learn about NumPy and Theano. By this end of this book, you will learn a set of advanced Machine Learning techniques and acquire a broad set of powerful skills in the area of feature selection & feature engineering.
Table of Contents (17 chapters)
Advanced Machine Learning with Python
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Chapter Code Requirements
Index

Chapter 1. Unsupervised Machine Learning

In this chapter, you will learn how to apply unsupervised learning techniques to identify patterns and structure within datasets.

Unsupervised learning techniques are a valuable set of tools for exploratory analysis. They bring out patterns and structure within datasets, which yield information that may be informative in itself or serve as a guide to further analysis. It's critical to have a solid set of unsupervised learning tools that you can apply to help break up unfamiliar or complex datasets into actionable information.

We'll begin by reviewing Principal Component Analysis (PCA), a fundamental data manipulation technique with a range of dimensionality reduction applications. Next, we will discuss k-means clustering, a widely-used and approachable unsupervised learning technique. Then, we will discuss Kohenen's Self-Organizing Map (SOM), a method of topological clustering that enables the projection of complex datasets into two dimensions.

Throughout the chapter, we will spend some time discussing how to effectively apply these techniques to make high-dimensional datasets readily accessible. We will use the UCI Handwritten Digits dataset to demonstrate technical applications of each algorithm. In the course of discussing and applying each technique, we will review practical applications and methodological questions, particularly regarding how to calibrate and validate each technique as well as which performance measures are valid. To recap, then, we will be covering the following topics in order:

  • Principal component analysis

  • k-means clustering

  • Self-organizing maps