Machine Learning Engineering with Python - Second Edition

By : Andrew P. McMahon

1.8 (4)

Buy this Book

Machine Learning Engineering with Python - Second Edition

1.8 (4)

By: Andrew P. McMahon

Buy this Book

Overview of this book

The Second Edition of Machine Learning Engineering with Python is the practical guide that MLOps and ML engineers need to build solutions to real-world problems. It will provide you with the skills you need to stay ahead in this rapidly evolving field. The book takes an examples-based approach to help you develop your skills and covers the technical concepts, implementation patterns, and development methodologies you need. You'll explore the key steps of the ML development lifecycle and create your own standardized "model factory" for training and retraining of models. You'll learn to employ concepts like CI/CD and how to detect different types of drift. Get hands-on with the latest in deployment architectures and discover methods for scaling up your solutions. This edition goes deeper in all aspects of ML engineering and MLOps, with emphasis on the latest open-source and cloud-based technologies. This includes a completely revamped approach to advanced pipelining and orchestration techniques. With a new chapter on deep learning, generative AI, and LLMOps, you will learn to use tools like LangChain, PyTorch, and Hugging Face to leverage LLMs for supercharged analysis. You will explore AI assistants like GitHub Copilot to become more productive, then dive deep into the engineering considerations of working with deep learning.

Preface

Who this book is for

What this book covers

To get the most out of this book

Get in touch

Introduction to ML Engineering

Technical requirements

Defining a taxonomy of data disciplines

Working as an effective team

ML engineering in the real world

What does an ML solution look like?

High-level ML system design

Summary

Free Chapter

The Machine Learning Development Process

Technical requirements

Setting up our tools

Concept to solution in four steps

Summary

From Model to Model Factory

Technical requirements

Defining the model factory

Learning about learning

Engineering features for machine learning

Designing your training system

Retraining required

Persisting your models

Building the model factory with pipelines

Summary

Packaging Up

Technical requirements

Writing good Python

Choosing a style

Packaging your code

Building your package

Testing, logging, securing, and error handling

Not reinventing the wheel

Summary

Deployment Patterns and Tools

Technical requirements

Architecting systems

Exploring some standard ML patterns

Containerizing

Hosting your own microservice on AWS

Building general pipelines with Airflow

Building advanced ML pipelines

Selecting your deployment strategy

Summary

Scaling Up

Technical requirements

Scaling with Spark

Spinning up serverless infrastructure

Containerizing at scale with Kubernetes

Scaling with Ray

Designing systems at scale

Summary

Deep Learning, Generative AI, and LLMOps

Going deep with deep learning

Living it large with LLMs

Building the future with LLMOps

Summary

Building an Example ML Microservice

Technical requirements

Understanding the forecasting problem

Designing our forecasting service

Selecting the tools

Training at scale

Serving the models with FastAPI

Containerizing and deploying to Kubernetes

Summary

Building an Extract, Transform, Machine Learning Use Case

Technical requirements

Understanding the batch processing problem

Designing an ETML solution

Selecting the tools

Executing the build

Summary

Other Books You May Enjoy

Index

Customer Reviews

1.8 (4)

5 star

4 star

25%

3 star

2 star

1 star

75%

Setting up our tools

To prepare for the work in the rest of this chapter, and indeed the rest of the book, it will be helpful to set up some tools. At a high level, we need the following:

Somewhere to code
Something to track our code changes
Something to help manage our tasks
Somewhere to provision infrastructure and deploy our solution

Let’s look at how to approach each of these in turn:

Somewhere to code: First, although the weapon of choice for coding by data scientists is of course Jupyter Notebook, once you begin to make the move toward ML engineering, it will be important to have an IDE to hand. An IDE is basically an application that comes with a series of built-in tools and capabilities to help you to develop the best software that you can. PyCharm is an excellent example for Python developers and comes with a wide variety of plugins, add-ons, and integrations useful to ML engineers. You can download the Community Edition from JetBrains at https://www.jetbrains.com/pycharm/. Another popular development tool is the lightweight but powerful source code editor VS Code. Once you have successfully installed PyCharm, you can create a new project or open an existing one from the Welcome to PyCharm window, as shown in Figure 2.1:

Figure 2.1: Opening or creating your PyCharm project.

Something to track code changes: Next on the list is a code version control system. In this book, we will use GitHub but there are a variety of solutions, all freely available, that are based on the same underlying open-source Git technology. Later sections will discuss how to use these as part of your development workflow, but first, if you do not have a version control system set up, you can navigate to github.com and create a free account. Follow the instructions on the site to create your first repository, and you will be shown a screen that looks something like Figure 2.2. To make your life easier later, you should select Add a README file and Add .gitignore (then select Python). The README file provides an initial Markdown file for you to get started with and somewhere to describe your project. The .gitignore file tells your Git distribution to ignore certain types of files that in general are not important for version control. It is up to you whether you want the repository to be public or private and what license you wish to use. The repository for this book uses the MIT license:

Figure 2.2: Setting up your GitHub repository.

Once you have set up your IDE and version control system, you need to make them talk to each other by using the Git plugins provided with PyCharm. This is as simple as navigating to VCS | Enable Version Control Integration and selecting Git. You can edit the version control settings by navigating to File | Settings | Version Control; see Figure 2.3:

Figure 2.3: Configuring version control with PyCharm.

Something to help manage our tasks: You are now ready to write Python and track your code changes, but are you ready to manage or participate in a complex project with other team members? For this, it is often useful to have a solution where you can track tasks, issues, bugs, user stories, and other documentation and items of work. It also helps if this has good integration points with the other tools you will use. In this book, we will use Jira as an example of this. If you navigate to https://www.atlassian.com/software/jira, you can create a free cloud Jira account and then follow the interactive tutorial within the solution to set up your first board and create some tasks. Figure 2.4 shows the task board for this book project, called Machine Learning Engineering in Python (MEIP):

Figure 2.4: The task board for this book in Jira.

Somewhere to provision infrastructure and deploy our solution: Everything that you have just installed and set up is tooling that will really help take your workflow and software development practices to the next level. The last piece of the puzzle is having the tools, technologies, and infrastructure available for deploying the end solution. The management of computing infrastructure for applications was (and often still is) the provision of dedicated infrastructure teams, but with the advent of public clouds, there has been real democratization of this capability for people working across the spectrum of software roles. In particular, modern ML engineering is very dependent on the successful implementation of cloud technologies, usually through the main public cloud providers such as Amazon Web Services (AWS), Microsoft Azure, or Google Cloud Platform (GCP). This book will utilize tools found in the AWS ecosystem, but all of the tools and techniques you will find here have equivalents in the other clouds.

The flip side of the democratization of capabilities that the cloud brings is that teams who own the deployment of their solutions have to gain new skills and understanding. I am a strong believer in the principle that “you build it, you own it, you run it” as far as possible, but this means that as an ML engineer, you will have to be comfortable with a host of potential new tools and principles, as well as owning the performance of your deployed solution. With great power comes great responsibility and all that. In Chapter 5, Deployment Patterns and Tools, we will dive into this topic in detail.

Let’s talk through setting this up.

Setting up an AWS account

As previously stated, you don’t have to use AWS, but that’s what we’re going to use throughout this book. Once it’s set up here, you can use it for everything we’ll do:

To set up an AWS account, navigate to aws.amazon.com and select Create Account. You will have to add some payment details but everything we mention in this book can be explored through the free tier of AWS, where you do not incur a cost below a certain threshold of consumption.
Once you have created your account, you can navigate to the AWS Management Console, where you can see all the services that are available to you (see Figure 2.5):

Figure 2.5: The AWS Management Console.

With our AWS account ready to go, let’s look at the four steps that cover the whole process.

Machine Learning Engineering with Python - Second Edition

By : Andrew P. McMahon

Machine Learning Engineering with Python - Second Edition

By: Andrew P. McMahon

Overview of this book

Related Content you might be interested in

Current Title:

Machine Learning Engineering with Python - Second Edition

Practical Machine Learning on Databricks

Machine Learning Engineering with MLflow.

Generative AI with LangChain

Setting up our tools

Setting up an AWS account