Chapter 2: Pre-Model Workflow and Data Preprocessing

Book Overview & Buying
Table Of Contents

scikit-learn Cookbook - Third Edition

By : John Sukup

5 (1)

Buy this Book

scikit-learn Cookbook

5 (1)

By: John Sukup

Buy this Book

Overview of this book

Trusted by data scientists, ML engineers, and software developers alike, scikit-learn offers a versatile, user-friendly framework for implementing a wide range of ML algorithms, enabling the efficient development and deployment of predictive models in real-world applications. This third edition of scikit-learn Cookbook will help you master ML with real-world examples and scikit-learn 1.5 features. This updated edition takes you on a journey from understanding the fundamentals of ML and data preprocessing, through implementing advanced algorithms and techniques, to deploying and optimizing ML models in production. Along the way, you’ll explore practical, step-by-step recipes that cover everything from feature engineering and model selection to hyperparameter tuning and model evaluation, all using scikit-learn. By the end of this book, you’ll have gained the knowledge and skills needed to confidently build, evaluate, and deploy sophisticated ML models using scikit-learn, ready to tackle a wide range of data-driven challenges. *Email sign-up and proof of purchase required

Preface

Who this book is for

What this book covers

To get the most out of this book

Get in touch

Share Your Thoughts

Free Benefits with Your Book

Free Chapter

Chapter 1: Common Conventions and API Elements of scikit-learn

Technical requirements

Introduction to scikit-learn’s design philosophy

Understanding estimators

Transformers and the transform() method

Handling custom estimators and transformers

Pipelines and workflow automation

Common attributes and methods

Hyperparameter tuning with search methods

Working with metadata: Tags and more

Best practices for API usage

Chapter 2: Pre-Model Workflow and Data Preprocessing

Technical requirements

The impact of raw data on model performance

Handling missing data

Scaling techniques

Encoding categorical variables

Introduction to pipelines in scikit-learn

Feature engineering

Practical exercises on data preprocessing

Chapter 3: Dimensionality Reduction Techniques

Technical requirements

Introduction to dimensionality reduction

Transforming datasets with PCA

Maximizing class separability with LDA

t-SNE and data visualization

Impact on model performance

Practical exercises in dimensionality reduction

Chapter 4: Building Models with Distance Metrics and Nearest Neighbors

Technical requirements

Introduction to distance metrics

Understanding KNNs

Distance metrics overview

Hyperparameter tuning in KNN

Evaluating KNN performance

Practical exercises with KNN models

Chapter 5: Linear Models and Regularization

Technical requirements

Introduction to linear models

Ridge and Lasso regression

ElasticNet and regularization

Regularization theory and practice

Regression and regularization

Practical exercises with regularization techniques

Chapter 6: Advanced Logistic Regression and Extensions

Technical requirements

Overview of logistic regression

Multiclass classification techniques

Regularization in logistic regression

Multilabel classification concepts

Model evaluation metrics

Practical exercises with advanced logistic regression

Chapter 7: Support Vector Machines and Kernel Methods

Technical requirements

Introduction to SVMs

Kernel functions and their applications

Tuning SVM parameters

SVMs in high-dimensional spaces

Evaluating SVM models

Practical exercises with SVMs

Chapter 8: Tree-Based Algorithms and Ensemble Methods

Technical requirements

Introduction to decision trees

Random forests and bagging

Gradient boosting machines

Hyperparameter tuning for trees and ensembles

Comparing ensemble methods

Practical exercises with tree-based models

Chapter 9: Text Processing and Multiclass Classification

Technical requirements

Introduction to text processing

Text vectorization techniques

Feature extraction from text

Implementing text classification models

Multiclass classification strategies

Evaluating text models

Practical exercises in text processing

Chapter 10: Clustering Techniques

Technical requirements

Introduction to clustering

K-means clustering

Hierarchical clustering

Density-based clustering with DBSCAN

Cluster evaluation metrics

Choosing the right clustering algorithm

Advanced clustering techniques

Practical exercises with clustering models

Chapter 11: Novelty and Outlier Detection

Technical requirements

Introduction to outlier and novelty detection

Understanding Isolation Forest

One-Class SVM for novelty detection

Detecting outliers with LOF

Evaluating outlier detection models

Handling detected outliers

Choosing the right detection technique

Practical exercises in novelty and outlier detection

Chapter 12: Cross-Validation and Model Evaluation Techniques

Technical requirements

Introduction to cross-validation

Advanced cross-validation methods

Implementing cross-validation in scikit-learn

Model selection techniques

Evaluating model generalizability

Practical exercises in cross-validation and evaluation

Chapter 13: Deploying scikit-learn Models in Production

Technical requirements

Overview of model deployment

Serialization and persistence techniques

Scaling models for production

Monitoring and updating deployed models

Managing the model life cycle

Setting up deployment pipelines

Practical exercises in model deployment

Chapter 14: Unlock Your Exclusive Benefits

Unlock this Book’s Free Benefits in 3 Easy Steps

Index

Why subscribe?

Other Books You May Enjoy

Packt is searching for authors like you

Share Your Thoughts

scikit-learn Cookbook - Third Edition

By : John Sukup

scikit-learn Cookbook

By: John Sukup

Overview of this book

Practical exercises on data preprocessing

How to do it…

Confirmation

Buy this book with your credits?

Submit Your Feedback

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access