Book Image

Principles of Data Science - Second Edition

By : Sinan Ozdemir, Sunil Kakade, Marco Tibaldeschi
Book Image

Principles of Data Science - Second Edition

By: Sinan Ozdemir, Sunil Kakade, Marco Tibaldeschi

Overview of this book

Need to turn programming skills into effective data science skills? This book helps you connect mathematics, programming, and business analysis. You’ll feel confident asking—and answering—complex, sophisticated questions of your data, making abstract and raw statistics into actionable ideas. Going through the data science pipeline, you'll clean and prepare data and learn effective data mining strategies and techniques to gain a comprehensive view of how the data science puzzle fits together. You’ll learn fundamentals of computational mathematics and statistics and pseudo-code used by data scientists and analysts. You’ll learn machine learning, discovering statistical models that help control and navigate even the densest datasets, and learn powerful visualizations that communicate what your data means.
Table of Contents (17 chapters)
16
Index

Logistic regression

Our first classification model is called logistic regression. I can already hear the questions you have in your head: what makes is logistic? Why is it called regression if you claim that this is a classification algorithm? All in good time, my friend.

Logistic regression is a generalization of the linear regression model that was adapted to fit classification problems. In linear regression, we use a set of quantitative feature variables to predict a continuous response variable. In logistic regression, we use a set of quantitative feature variables to predict the probabilities of class membership. These probabilities can then be mapped to class labels, hence predicting a class for each observation.

When performing linear regression, we use the following function to make our line of best fit:

Logistic regression

Here, y is our response variable (the thing we wish to predict), our beta represents our model parameters, and x represents our input variable (a single one in this case, but it can...