Book Image

Principles of Data Science - Second Edition

By : Sinan Ozdemir, Sunil Kakade, Marco Tibaldeschi
Book Image

Principles of Data Science - Second Edition

By: Sinan Ozdemir, Sunil Kakade, Marco Tibaldeschi

Overview of this book

Need to turn programming skills into effective data science skills? This book helps you connect mathematics, programming, and business analysis. You’ll feel confident asking—and answering—complex, sophisticated questions of your data, making abstract and raw statistics into actionable ideas. Going through the data science pipeline, you'll clean and prepare data and learn effective data mining strategies and techniques to gain a comprehensive view of how the data science puzzle fits together. You’ll learn fundamentals of computational mathematics and statistics and pseudo-code used by data scientists and analysts. You’ll learn machine learning, discovering statistical models that help control and navigate even the densest datasets, and learn powerful visualizations that communicate what your data means.
Table of Contents (17 chapters)
16
Index

Unsupervised learning

It's time to look at some examples of unsupervised learning, given that we spend the majority of this book on supervised learning models.

When to use unsupervised learning

There are many times when unsupervised learning can be appropriate. Some very common examples include the following:

  • There is no clear response variable. There is nothing that we are explicitly trying to predict or correlate to other variables.
  • To extract structure from data where no apparent structure/patterns exist (can be a supervised learning problem).
  • When an unsupervised concept called feature extraction is used. Feature extraction is the process of creating new features from existing ones. These new features can be even stronger than the original features.

The first tends to be the most common reason that data scientists choose to use unsupervised learning. This case arises frequently when we are working with data and we are not explicitly trying to predict any of the columns, and we merely...