11. Data Preparation | The Data Science Workshop

Book Overview & Buying
Table Of Contents

The Data Science Workshop - Second Edition

By : Anthony So , Thomas Joseph, Robert Thas John, Andrew Worsley , Dr. Samuel Asare

3 (2)

Buy this Book

The Data Science Workshop

3 (2)

By: Anthony So , Thomas Joseph, Robert Thas John, Andrew Worsley , Dr. Samuel Asare

Buy this Book

Overview of this book

Where there’s data, there’s insight. With so much data being generated, there is immense scope to extract meaningful information that’ll boost business productivity and profitability. By learning to convert raw data into game-changing insights, you’ll open new career paths and opportunities. The Data Science Workshop begins by introducing different types of projects and showing you how to incorporate machine learning algorithms in them. You’ll learn to select a relevant metric and even assess the performance of your model. To tune the hyperparameters of an algorithm and improve its accuracy, you’ll get hands-on with approaches such as grid search and random search. Next, you’ll learn dimensionality reduction techniques to easily handle many variables at once, before exploring how to use model ensembling techniques and create new features to enhance model performance. In a bid to help you automatically create new features that improve your model, the book demonstrates how to use the automated feature engineering tool. You’ll also understand how to use the orchestration and scheduling workflow to deploy machine learning models in batch. By the end of this book, you’ll have the skills to start working on data science projects confidently. By the end of this book, you’ll have the skills to start working on data science projects confidently.

Preface

About the Book

1. Introduction to Data Science in Python

Introduction

Application of Data Science

Overview of Python

Python for Data Science

Scikit-Learn

Summary

Free Chapter

2. Regression

Introduction

Simple Linear Regression

Multiple Linear Regression

Conducting Regression Analysis Using Python

Multiple Regression Analysis

Assumptions of Regression Analysis

Explaining the Results of Regression Analysis

Summary

3. Binary Classification

Introduction

Understanding the Business Context

Feature Engineering

Data-Driven Feature Engineering

Correlation Matrix and Visualization

Summary

4. Multiclass Classification with RandomForest

Introduction

Training a Random Forest Classifier

Evaluating the Model's Performance

Maximum Depth

Minimum Sample in Leaf

Maximum Features

Summary

5. Performing Your First Cluster Analysis

Introduction

Clustering with k-means

Interpreting k-means Results

Choosing the Number of Clusters

Initializing Clusters

Calculating the Distance to the Centroid

Standardizing Data

Summary

6. How to Assess Performance

Introduction

Splitting Data

Assessing Model Performance for Regression Models

Assessing Model Performance for Classification Models

The Confusion Matrix

Receiver Operating Characteristic Curve

Area Under the ROC Curve

Saving and Loading Models

Summary

7. The Generalization of Machine Learning Models

Introduction

Overfitting

Underfitting

Data

Random State

Cross-Validation

cross_val_score

LogisticRegressionCV

Hyperparameter Tuning with GridSearchCV

Hyperparameter Tuning with RandomizedSearchCV

Model Regularization with Lasso Regression

Ridge Regression

Summary

8. Hyperparameter Tuning

Introduction

What Are Hyperparameters?

Finding the Best Hyperparameterization

Tuning Using Grid Search

GridSearchCV

Random Search

Summary

9. Interpreting a Machine Learning Model

Introduction

Linear Model Coefficients

RandomForest Variable Importance

Variable Importance via Permutation

Partial Dependence Plots

Local Interpretation with LIME

Summary

10. Analyzing a Dataset

Introduction

Exploring Your Data

Analyzing Your Dataset

Analyzing the Content of a Categorical Variable

Summarizing Numerical Variables

Visualizing Your Data

Boxplots

Summary

11. Data Preparation

Introduction

Handling Row Duplication

Converting Data Types

Handling Incorrect Values

Handling Missing Values

Summary

12. Feature Engineering

Introduction

13. Imbalanced Datasets

Introduction

Understanding the Business Context

Challenges of Imbalanced Datasets

Strategies for Dealing with Imbalanced Datasets

Generating Synthetic Samples

Summary

14. Dimensionality Reduction

Introduction

Creating a High-Dimensional Dataset

Strategies for Addressing High-Dimensional Datasets

Comparing Different Dimensionality Reduction Techniques

Summary

15. Ensemble Learning

Introduction

Ensemble Learning

Simple Methods for Ensemble Learning

Advanced Techniques for Ensemble Learning

Summary

The Data Science Workshop - Second Edition

By : Anthony So , Thomas Joseph, Robert Thas John, Andrew Worsley , Dr. Samuel Asare

The Data Science Workshop

By: Anthony So , Thomas Joseph, Robert Thas John, Andrew Worsley , Dr. Samuel Asare

Overview of this book

Summary

Confirmation

Buy this book with your credits?

Submit Your Feedback

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access