Book Image

Machine Learning with LightGBM and Python

By : Andrich van Wyk
3 (1)
Book Image

Machine Learning with LightGBM and Python

3 (1)
By: Andrich van Wyk

Overview of this book

Machine Learning with LightGBM and Python is a comprehensive guide to learning the basics of machine learning and progressing to building scalable machine learning systems that are ready for release. This book will get you acquainted with the high-performance gradient-boosting LightGBM framework and show you how it can be used to solve various machine-learning problems to produce highly accurate, robust, and predictive solutions. Starting with simple machine learning models in scikit-learn, you’ll explore the intricacies of gradient boosting machines and LightGBM. You’ll be guided through various case studies to better understand the data science processes and learn how to practically apply your skills to real-world problems. As you progress, you’ll elevate your software engineering skills by learning how to build and integrate scalable machine-learning pipelines to process data, train models, and deploy them to serve secure APIs using Python tools such as FastAPI. By the end of this book, you’ll be well equipped to use various -of-the-art tools that will help you build production-ready systems, including FLAML for AutoML, PostgresML for operating ML pipelines using Postgres, high-performance distributed training and serving via Dask, and creating and running models in the Cloud with AWS Sagemaker.
Table of Contents (17 chapters)
1
Part 1: Gradient Boosting and LightGBM Fundamentals
6
Part 2: Practical Machine Learning with LightGBM
10
Part 3: Production-ready Machine Learning with LightGBM

What this book covers

Chapter 1, Introducing Machine Learning, starts our journey into ML, viewing it through the lens of software engineering. We will elucidate vital concepts central to the field, such as models, datasets, and the various learning paradigms, ensuring clarity with a hands-on example using decision trees.

Chapter 2, Ensemble Learning – Bagging and Boosting, delves into ensemble learning, focusing on bagging and boosting techniques applied to decision trees. We will explore algorithms such as random forests, gradient-boosted decision trees, and more advanced concepts such as Dropout meets Additive Regression Trees (DART).

Chapter 3, An Overview of LightGBM in Python, examines LightGBM, an advanced gradient-boosting framework with tree-based learners. Highlighting its unique innovations and enhancements to ensemble learning, we will guide you through its Python APIs. A comprehensive modeling example using LightGBM, enriched with advanced validation and optimization techniques, sets the stage for a deeper dive into data science and production systems ML.

Chapter 4, Comparing LightGBM, XGBoost, and Deep Learning, pits LightGBM against two prominent tabular data modeling methods – XGBoost and deep neural networks (DNNs), specifically TabTransformer. We will assess each method’s complexity, performance, and computational cost through evaluations of two datasets. The essence of this chapter is ascertaining LightGBM’s competitiveness in the broader ML landscape, rather than an in-depth study of XGBoost or DNNs.

Chapter 5, LightGBM Parameter Optimization with Optuna, focuses on the pivotal task of hyperparameter optimization, introducing the Optuna framework as a potent solution. Covering various optimization algorithms and strategies to prune the hyperparameter space, this chapter guides you through a hands-on example of refining LightGBM parameters using Optuna.

Chapter 6, Solving Real-World Data Science Problems with LightGBM, methodically breaks down the data science process, applying it to two distinct case studies – a regression and a classification problem. The chapter illuminates each step of the data science life cycle. You will experience hands-on modeling with LightGBM, paired with comprehensive theory. This chapter also serves as a blueprint for data science projects using LightGBM.

Chapter 7, AutoML with LightGBM and FLAML, delves into automated machine learning (AutoML), emphasizing its significance in simplifying and expediting data engineering and model development. We will introduce FLAML, a notable library that automates model selection and fine-tuning with efficient hyperparameter algorithms. Through a practical case study, you will witness FLAML’s synergy with LightGBM and the transformative Zero-Shot AutoML functionality, which renders the tuning process obsolete.

Chapter 8, Machine Learning Pipelines and MLOps with LightGBM, moves on from modeling intricacies to the world of production ML. It introduces you to ML pipelines, ensuring consistent data processing and model building, and ventures into MLOps, a fusion of DevOps and ML, which is vital to deploying resilient ML systems.

Chapter 9, LightGBM MLOps with AWS SageMaker, steers our journey toward Amazon SageMaker, Amazon Web Services’ comprehensive suite to craft and maintain ML solutions. We will deepen our understanding of ML pipelines by delving into advanced areas such as bias detection, explainability in models, and the nuances of automated, scalable deployments.

Chapter 10, LightGBM Models with PostgresML, introduces PostgresML, a distinct MLOps platform and a PostgreSQL database extension that facilitates ML model development and deployment directly via SQL. This approach, while contrasting the scikit-learn programming style that we’ve embraced, showcases the benefits of database-level ML, particularly regarding data movement efficiencies and faster inferencing.

Chapter 11, Distributed and GPU-Based Learning with LightGBM, delves into the expansive realm of training LightGBM models, leveraging distributed computing clusters and GPUs. By harnessing distributed computing, you will understand how to substantially accelerate training workloads and manage datasets that exceed a single machine’s memory capacity.