2. Exploratory Data Analysis and Visualization | The Supervised Learning Workshop

Book Overview & Buying
Table Of Contents

The Supervised Learning Workshop - Second Edition

By : Blaine Bateman, Ashish Ranjan Jha, Benjamin Johnston, Ishita Mathur, Tiffany Ford, Sukanya Mandal, Ashish Pratik Patil

4.9 (10)

Buy this Book

The Supervised Learning Workshop

4.9 (10)

By: Blaine Bateman, Ashish Ranjan Jha, Benjamin Johnston, Ishita Mathur, Tiffany Ford, Sukanya Mandal, Ashish Pratik Patil

Buy this Book

Overview of this book

Would you like to understand how and why machine learning techniques and data analytics are spearheading enterprises globally? From analyzing bioinformatics to predicting climate change, machine learning plays an increasingly pivotal role in our society. Although the real-world applications may seem complex, this book simplifies supervised learning for beginners with a step-by-step interactive approach. Working with real-time datasets, you’ll learn how supervised learning, when used with Python, can produce efficient predictive models. Starting with the fundamentals of supervised learning, you’ll quickly move to understand how to automate manual tasks and the process of assessing date using Jupyter and Python libraries like pandas. Next, you’ll use data exploration and visualization techniques to develop powerful supervised learning models, before understanding how to distinguish variables and represent their relationships using scatter plots, heatmaps, and box plots. After using regression and classification models on real-time datasets to predict future outcomes, you’ll grasp advanced ensemble techniques such as boosting and random forests. Finally, you’ll learn the importance of model evaluation in supervised learning and study metrics to evaluate regression and classification tasks. By the end of this book, you’ll have the skills you need to work on your real-life supervised learning Python projects.

Preface

About the Book

1. Fundamentals

Introduction

Data Quality Considerations

Summary

Free Chapter

2. Exploratory Data Analysis and Visualization

Introduction

Exploratory Data Analysis (EDA)

Summary Statistics and Central Values

Missing Values

Distribution of Values

Relationships within the Data

Summary

3. Linear Regression

Introduction

Regression and Classification Problems

Linear Regression

Multiple Linear Regression

Summary

4. Autoregression

Introduction

Autoregression Models

Summary

5. Classification Techniques

Introduction

Ordinary Least Squares as a Classifier

Logistic Regression

Classification Using K-Nearest Neighbors

Classification Using Decision Trees

Artificial Neural Networks

Summary

6. Ensemble Modeling

Introduction

One-Hot Encoding

Overfitting and Underfitting

Bagging

Bootstrapping

Boosting

Stacking

Summary

7. Model Evaluation

Introduction

Importing the Modules and Preparing Our Dataset

Evaluation Metrics

Splitting a Dataset

Performance Improvement Tactics

Summary

Appendix

1. Fundamentals

2. Exploratory Data Analysis and Visualization

3. Linear Regression

4. Autoregression

5. Classification Techniques

6. Ensemble Modeling

7. Model Evaluation

Introduction

Say we have a problem statement that involves predicting whether a particular earthquake caused a tsunami. How do we decide what model to use? What do we know about the data we have? Nothing! But if we don't know and understand our data, chances are we'll end up building a model that's not very interpretable or reliable. When it comes to data science, it's important to have a thorough understanding of the data we're dealing with, in order to generate features that are highly informative and, consequently, to build accurate and powerful models. To acquire this understanding, we perform an exploratory analysis of the data to see what the data can tell us about the relationships between the features and the target variable (the value that you are trying to predict using the other variables). Getting to know our data will even help us interpret the model we build and identify ways we can improve its accuracy. The approach we take to achieve this is to allow the data to reveal its structure or model, which helps us gain some new, often unsuspected, insight into the data.

We will first begin with a brief introduction to exploratory data analysis and then progress to explaining summary statistics and central values. This chapter also teaches you how to find and visualize missing values and then describes the various imputation strategies for addressing the problem of missing values. The remainder of the chapter then focuses on visualizations. Specifically, the chapter teaches you how to create various plots such as scatter plot, histograms, pie charts, heatmaps, pairplots and more. Let us begin with exploratory data analysis.

The Supervised Learning Workshop - Second Edition

By : Blaine Bateman, Ashish Ranjan Jha, Benjamin Johnston, Ishita Mathur, Tiffany Ford, Sukanya Mandal, Ashish Pratik Patil

The Supervised Learning Workshop

By: Blaine Bateman, Ashish Ranjan Jha, Benjamin Johnston, Ishita Mathur, Tiffany Ford, Sukanya Mandal, Ashish Pratik Patil

Overview of this book

Introduction

Confirmation

Buy this book with your credits?

Submit Your Feedback

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access