13. Imbalanced Datasets | The Data Science Workshop

Book Overview & Buying
Table Of Contents

The Data Science Workshop

By : Anthony So , Thomas Joseph, Robert Thas John, Andrew Worsley , Dr. Samuel Asare , Ivan Liu, Tiffany Ford, Barbora stetinova, Pritesh Tiwari

3.3 (4)

Buy this Book

The Data Science Workshop

3.3 (4)

By: Anthony So , Thomas Joseph, Robert Thas John, Andrew Worsley , Dr. Samuel Asare , Ivan Liu, Tiffany Ford, Barbora stetinova, Pritesh Tiwari

Buy this Book

Overview of this book

You already know you want to learn data science, and a smarter way to learn data science is to learn by doing. The Data Science Workshop focuses on building up your practical skills so that you can understand how to develop simple machine learning models in Python or even build an advanced model for detecting potential bank frauds with effective modern data science. You'll learn from real examples that lead to real results. Throughout The Data Science Workshop, you'll take an engaging step-by-step approach to understanding data science. You won't have to sit through any unnecessary theory. If you're short on time you can jump into a single exercise each day or spend an entire weekend training a model using sci-kit learn. It's your choice. Learning on your terms, you'll build up and reinforce key skills in a way that feels rewarding. Every physical print copy of The Data Science Workshop unlocks access to the interactive edition. With videos detailing all exercises and activities, you'll always have a guided solution. You can also benchmark yourself against assessments, track progress, and receive content updates. You'll even earn a secure credential that you can share and verify online upon completion. It's a premium learning experience that's included with your printed copy. To redeem, follow the instructions located at the start of your data science book. Fast-paced and direct, The Data Science Workshop is the ideal companion for data science beginners. You'll learn about machine learning algorithms like a data scientist, learning along the way. This process means that you'll find that your new skills stick, embedded as best practice. A solid foundation for the years ahead.

Preface

About the Book

1. Introduction to Data Science in Python

Introduction

Application of Data Science

Overview of Python

Python for Data Science

Scikit-Learn

Summary

Free Chapter

2. Regression

Introduction

Simple Linear Regression

Multiple Linear Regression

Conducting Regression Analysis Using Python

Multiple Regression Analysis

Assumptions of Regression Analysis

Explaining the Results of Regression Analysis

Summary

3. Binary Classification

Introduction

Understanding the Business Context

Feature Engineering

Data-Driven Feature Engineering

Correlation Matrix and Visualization

Summary

4. Multiclass Classification with RandomForest

Introduction

Training a Random Forest Classifier

Evaluating the Model's Performance

Maximum Depth

Minimum Sample in Leaf

Maximum Features

Summary

5. Performing Your First Cluster Analysis

Introduction

Clustering with k-means

Interpreting k-means Results

Choosing the Number of Clusters

Initializing Clusters

Calculating the Distance to the Centroid

Standardizing Data

Summary

6. How to Assess Performance

Introduction

Splitting Data

Assessing Model Performance for Regression Models

Assessing Model Performance for Classification Models

The Confusion Matrix

Receiver Operating Characteristic Curve

Area Under the ROC Curve

Saving and Loading Models

Summary

7. The Generalization of Machine Learning Models

Introduction

Overfitting

Underfitting

Data

Random State

Cross-Validation

cross_val_score

LogisticRegressionCV

Hyperparameter Tuning with GridSearchCV

Hyperparameter Tuning with RandomizedSearchCV

Model Regularization with Lasso Regression

Ridge Regression

Summary

8. Hyperparameter Tuning

Introduction

What Are Hyperparameters?

Finding the Best Hyperparameterization

Tuning Using Grid Search

GridSearchCV

Random Search

Summary

9. Interpreting a Machine Learning Model

Introduction

Linear Model Coefficients

RandomForest Variable Importance

Variable Importance via Permutation

Partial Dependence Plots

Local Interpretation with LIME

Summary

10. Analyzing a Dataset

Introduction

Exploring Your Data

Analyzing Your Dataset

Analyzing the Content of a Categorical Variable

Summarizing Numerical Variables

Visualizing Your Data

Boxplots

Summary

11. Data Preparation

Introduction

Handling Row Duplication

Converting Data Types

Handling Incorrect Values

Handling Missing Values

Summary

12. Feature Engineering

Introduction

Merging Datasets

Binning Variables

Manipulating Dates

Performing Data Aggregation

Summary

13. Imbalanced Datasets

Introduction

Understanding the Business Context

Challenges of Imbalanced Datasets

Strategies for Dealing with Imbalanced Datasets

Generating Synthetic Samples

Summary

14. Dimensionality Reduction

Introduction

Creating a High-Dimensional Dataset

Strategies for Addressing High-Dimensional Datasets

Comparing Different Dimensionality Reduction Techniques

Summary

15. Ensemble Learning

Introduction

Ensemble Learning

Simple Methods for Ensemble Learning

Summary

16. Machine Learning Pipelines

Introduction

Pipelines

Automating ML Workflows Using Pipeline

ML Pipeline with Processing and Dimensionality Reduction

ML Pipeline for Modeling and Prediction

ML Pipeline for Spot-Checking Multiple Models

ML Pipelines for Identifying the Best Parameters for a Model

Applying Pipelines to a Dataset

Summary

17. Automated Feature Engineering

Introduction

Feature Engineering

Featuretools on a New Dataset

Summary

The Data Science Workshop

By : Anthony So , Thomas Joseph, Robert Thas John, Andrew Worsley , Dr. Samuel Asare , Ivan Liu, Tiffany Ford, Barbora stetinova, Pritesh Tiwari

The Data Science Workshop

By: Anthony So , Thomas Joseph, Robert Thas John, Andrew Worsley , Dr. Samuel Asare , Ivan Liu, Tiffany Ford, Barbora stetinova, Pritesh Tiwari

Overview of this book

Introduction

Confirmation

Buy this book with your credits?

Submit Your Feedback

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access