The Machine Learning Solutions Architect Handbook

By : David Ping

The Machine Learning Solutions Architect Handbook

By: David Ping

Overview of this book

When equipped with a highly scalable machine learning (ML) platform, organizations can quickly scale the delivery of ML products for faster business value realization. There is a huge demand for skilled ML solutions architects in different industries, and this handbook will help you master the design patterns, architectural considerations, and the latest technology insights you’ll need to become one. You’ll start by understanding ML fundamentals and how ML can be applied to solve real-world business problems. Once you've explored a few leading problem-solving ML algorithms, this book will help you tackle data management and get the most out of ML libraries such as TensorFlow and PyTorch. Using open source technology such as Kubernetes/Kubeflow to build a data science environment and ML pipelines will be covered next, before moving on to building an enterprise ML architecture using Amazon Web Services (AWS). You’ll also learn about security and governance considerations, advanced ML engineering techniques, and how to apply bias detection, explainability, and privacy in ML model development. By the end of this book, you’ll be able to design and build an ML platform to support common use cases and architecture patterns like a true professional.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Share Your Thoughts

Section 1: Solving Business Challenges with Machine Learning Solution Architecture

Free Chapter

Chapter 1: Machine Learning and Machine Learning Solutions Architecture

What are AI and ML?

ML versus traditional software

ML life cycle

ML challenges

ML solutions architecture

Testing your knowledge

Summary

Chapter 2: Business Use Cases for Machine Learning

ML use cases in financial services

ML use cases in media and entertainment

ML use cases in healthcare and life sciences

ML use cases in manufacturing

ML use cases in retail

ML use case identification exercise

Summary

Section 2: The Science, Tools, and Infrastructure Platform for Machine Learning

Chapter 3: Machine Learning Algorithms

Technical requirements

How machines learn

Overview of ML algorithms

Hands-on exercise

Summary

Chapter 4: Data Management for Machine Learning

Technical requirements

Data management considerations for ML

Data management architecture for ML

Hands-on exercise – data management for ML

Summary

Chapter 5: Open Source Machine Learning Libraries

Technical requirements

Core features of open source machine learning libraries

Understanding the scikit-learn machine learning library

Understanding the Apache Spark ML machine learning library

Understanding the TensorFlow deep learning library

Hands-on exercise – training a TensorFlow model

Understanding the PyTorch deep learning library

Hands-on exercise – building and training a PyTorch model

Summary

Chapter 6: Kubernetes Container Orchestration Infrastructure Management

Technical requirements

Introduction to containers

Kubernetes overview and core concepts

Networking on Kubernetes

Security and access management

Hands-on – creating a Kubernetes infrastructure on AWS

Summary

Section 3: Technical Architecture Design and Regulatory Considerations for Enterprise ML Platforms

Chapter 7: Open Source Machine Learning Platforms

Technical requirements

Core components of an ML platform

Open source technologies for building ML platforms

Hands-on exercise – building a data science architecture using open source technologies

Summary

Chapter 8: Building a Data Science Environment Using AWS ML Services

Technical requirements

Data science environment architecture using SageMaker

Hands-on exercise – building a data science environment using AWS services

Summary

Chapter 9: Building an Enterprise ML Architecture with AWS ML Services

Technical requirements

Key requirements for an enterprise ML platform

Enterprise ML architecture pattern overview

Model training environment

Model hosting environment deep dive

Adopting MLOps for ML workflows

Hands-on exercise – building an MLOps pipeline on AWS

Summary

Chapter 10: Advanced ML Engineering

Technical requirements

Training large-scale models with distributed training

Achieving low latency model inference

Hands-on lab – running distributed model training with PyTorch

Summary

Chapter 11: ML Governance, Bias, Explainability, and Privacy

Technical requirements

What is ML governance and why is it needed?

Understanding the ML governance framework

Understanding ML bias and explainability

Designing an ML platform for governance

Hands-on lab – detecting bias, model explainability, and training privacy-preserving models

Chapter 12: Building ML Solutions with AWS AI Services

Technical requirements

What are AI services?

Overview of AWS AI services

Building intelligent solutions with AI services

Designing an MLOps architecture for AI services

Hands-on lab – running ML tasks using AI services

Summary

Why subscribe?

Other Books You May Enjoy

Packt is searching for authors like you

Share Your Thoughts

Customer Reviews

5 star

4 star

3 star

2 star

1 star

ML solutions architecture

When I initially worked as an ML solutions architect with companies on ML projects, the focus was mainly on data science and modeling. Both the problem scope and the number of models were small. Most of the problems could be solved using simple ML techniques. The dataset was also small and did not require a large infrastructure for model training. The scope of the ML initiative at these companies was limited to a few data scientists or teams. As an ML architect back then, I mostly needed data science skills and general cloud architecture knowledge to work on those projects.

Over the last several years, the ML initiatives at different companies have become a lot more complex and started to involve a lot more functions and people at the companies. I've found myself talking to business executives more about ML strategies and organizational design to enable broad adoption across their enterprise. I have been asked to help design more complex ML platforms using a wide range of technologies for large enterprises across many business units that met stringent security and compliance needs. There have been more architecture and process discussions around ML workflow orchestration and operations in recent years than ever before. And more and more companies are looking to train ML models of enormous size with terabytes of training data. The number of ML models trained and deployed by some companies has gone up to tens of thousands from a few dozen models just a couple of years ago. Sophisticated and security-sensitive customers have also been looking for guidance on ML privacy, model explainability, and data and model bias. As a practitioner in ML solutions architecture, I've found the skills and knowledge required to be effective in this function have changed drastically.

So, where does ML solutions architecture fit in this complex business, data, science, and technology Venn diagram? Based on my years of experience working with companies of different sizes and in different industries, I see ML solutions architecture as an overarching discipline that helps connect the various pieces of an ML initiative covering everything from the business requirements to the technology. An ML solutions architect interacts with different business and technology partners, comes up with ML solutions for the business problems, and designs the technology platforms to run the ML solutions.

From a specific function perspective, ML solutions architecture covers the following areas:

Figure 1.7 – ML solutions architecture coverage

Let's take a look at each of these elements:

Business understanding: Business problem understanding and transformation using AI and ML
Identification and verification of ML techniques: Identification and verification of ML techniques for solving specific ML problems
System architecture of the ML technology platform: System architecture design and implementation of the ML technology platforms
ML platform automation: ML platform automation technical design
Security and compliance: Security, compliance, and audit considerations for the ML platform and ML models

Business understanding and ML transformation

The goal of the business workflow analysis is to identify inefficiencies in the workflows and determine if ML can be applied to help eliminate pain points, improve efficiency, or even create new revenue opportunities.

For example, when you conduct analysis for a call center operation, you want to identify pain points such as long customer waiting times, knowledge gaps among customer service agents, the inability to extract customer insights from call recordings, and the lack of ability to target customers for incremental services and products. After you have identified these pain points, you want to find out what data is available and what business metrics to improve. Based on the pain points and the availability of data, you can come up with some hypotheses on potential ML solutions, such as a virtual assistant to handle common customer inquiries, audio to text transcription to allow the text analysis of transcribed text, and intent detection for product cross-sell and up-sell.

Sometimes, a business process modification is required to adopt ML solutions for the established business goals. Using the same call center example, if there is a business need to do more product cross-sell or up-sell based on the insights generated from the call recording analytics, but there is no business process that would act on the insights to target the customers for cross-sell/up-sell, then an automated target marketing process or proactive out-reach process by the sales professionals should be introduced.

Identification and verification of ML techniques

Once a list of ML options is identified, determine the need for validating the ML assumption. This could involve simple Proof of Concept (POC) modeling to validate the available dataset and modeling approach, or technology POC using pre-built AI services, or testing of ML frameworks. For example, you might want to test the feasibility of text transcription from audio files using an existing text transcription service or build a custom propensity model for a new product conversion from a marketing campaign. ML solutions architecture does not focus on the research and development of new machine algorithms, which is usually the job of the applied data scientists and research data scientists.

Instead, ML solutions architecture focuses on identifying and applying ML algorithms to solve different ML problems such as predictive analytics, computer vision, and/or natural language processing. Also, the goal of any modeling task here is not to build production-quality models, but rather to validate the approach for further experimentations, which is usually the responsibility of full-time applied data scientists.

System architecture design and implementation

The most important aspect of ML solutions architecture coverage is the technical architecture design of the ML platform. The platform will need to provide the technical capability to support the different phases of the ML cycle and personas, such as data scientists and ops engineers. Specifically, an ML platform needs to have the following core functions:

Data explorations and experimentation: Data scientists use the ML platform for data exploration, experimentation, model building, and model evaluation. The ML platform needs to provide capabilities such as data science development tools for model authoring and experimentation, data wrangling tools for data exploration and wrangling, source code control for code management, and a package repository for library package management.
Data management and large-scale data processing: Data scientists or data engineers will need the technical capability to store, access, and process large amounts of data for cleansing, transformation, and feature engineering.
Model training infrastructure management: The ML platform will need to provide model training infrastructure for different modeling training using different types of computing resources, storage, and networking configurations. It also needs to support different types of ML libraries or frameworks, such as scikit-learn, TensorFlow, and PyTorch.
Model hosting/serving: The ML platform will need to provide the technical capability to host and serve the model for prediction generations, either for real-time, batch, or both.
Model management: Trained ML models will need to be managed and tracked for easy access and lookup, with relevant metadata.
Feature management: Common and reusable features will need to be managed and served for model training and model serving purposes.

ML platform workflow automation

A key aspect of ML platform design is workflow automation and continuous integration/continuous deployment (CI/CD). ML is a multi-step workflow – it needs to be automated, which includes data processing, model training, model validation, and model hosting. Infrastructure provisioning automation and self-service is another aspect of automation design. Key components of workflow automation include the following:

Pipeline design and management: The ability to create different automation pipelines for various tasks, such as model training and model hosting.
Pipeline execution and monitoring: The ability to run different pipelines and monitor the pipeline execution status for the entire pipeline and each of the steps.
Model monitoring configuration: The ability to monitor the model in production for various metrics, such as data drift (where the distribution of data used in production deviates from the distribution of data used for model training), model drift (where the performance of the model degrades in the production compared with training results), and bias detection (the ML model replicating or amplifying bias towards certain individuals).

Security and compliance

Another important aspect of ML solutions architecture is the security and compliance consideration in a sensitive or enterprise setting:

Authentication and authorization: The ML platform needs to provide authentication and authorization mechanisms to manage the access to the platform and different resources and services.
Network security: The ML platform needs to be configure for different network security to prevent unauthorized access.
Data encryption: For security-sensitive organizations, data encryption is another important aspect of the design consideration for the ML platform.
Audit and compliance: Audit and compliance staff need the information to help them understand how decisions are made by the predictive models if required, the lineage of a model from data to model artifacts, and any bias exhibited in the data and model. The ML platform will need to provide model explainability, bias detection, and model traceability across the various datastore and service components, among other capabilities.

The Machine Learning Solutions Architect Handbook

By : David Ping

The Machine Learning Solutions Architect Handbook

By: David Ping

Overview of this book

Related Content you might be interested in

Current Title:

The Machine Learning Solutions Architect Handbook

Getting Started with Amazon SageMaker Studio

Amazon SageMaker Best Practices

Accelerate Deep Learning Workloads with Amazon SageMaker

ML solutions architecture

Business understanding and ML transformation

Identification and verification of ML techniques

System architecture design and implementation

ML platform workflow automation

Security and compliance