Summary and Where to Go Next | Machine Learning Infrastructure and Best Practices for Software Engineers

Book Overview & Buying
Table Of Contents

Machine Learning Infrastructure and Best Practices for Software Engineers

By : Miroslaw Staron

Buy this Book

Machine Learning Infrastructure and Best Practices for Software Engineers

By: Miroslaw Staron

Buy this Book

Overview of this book

Although creating a machine learning pipeline or developing a working prototype of a software system from that pipeline is easy and straightforward nowadays, the journey toward a professional software system is still extensive. This book will help you get to grips with various best practices and recipes that will help software engineers transform prototype pipelines into complete software products. The book begins by introducing the main concepts of professional software systems that leverage machine learning at their core. As you progress, you’ll explore the differences between traditional, non-ML software, and machine learning software. The initial best practices will guide you in determining the type of software you need for your product. Subsequently, you will delve into algorithms, covering their selection, development, and testing before exploring the intricacies of the infrastructure for machine learning systems by defining best practices for identifying the right data source and ensuring its quality. Towards the end, you’ll address the most challenging aspect of large-scale machine learning systems – ethics. By exploring and defining best practices for assessing ethical risks and strategies for mitigation, you will conclude the book where it all began – large-scale machine learning software.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Conventions used

Get in touch

Share Your Thoughts

Download a free PDF copy of this book

Part 1:Machine Learning Landscape in Software Engineering

Free Chapter

Machine Learning Compared to Traditional Software

Machine learning is not traditional software

Probability and software – how well they go together

Testing and evaluation – the same but different

Summary

References

Elements of a Machine Learning System

Elements of a production machine learning system

Data and algorithms

Data collection

Configuration and monitoring

Infrastructure and resource management

How this all comes together – machine learning pipelines

References

Data in Software Systems – Text, Images, Code, and Their Annotations

Raw data and features – what are the differences?

Every data has its purpose – annotations and tasks

Annotating text for intent recognition

Where different types of data can be used together – an outlook on multi-modal data models

References

Data Acquisition, Data Quality, and Noise

Sources of data and what we can do with them

Extracting data from software engineering tools – Gerrit and Jira

Extracting data from product databases – GitHub and Git

Data quality

Noise

Summary

References

Quantifying and Improving Data Properties

Feature engineering – the basics

Clean data

Noise in data management

Attribute noise

Splitting data

How ML models handle noise

References

Part 2: Data Acquisition and Management

Processing Data in Machine Learning Systems

Numerical data

Other types of data – images

Text data

Toward feature engineering

References

Feature Engineering for Numerical and Image Data

Feature engineering

Feature engineering for numerical data

Feature engineering for image data

Summary

References

Feature Engineering for Natural Language Data

Natural language data in software engineering and the rise of GitHub Copilot

What a tokenizer is and what it does

Bag-of-words and simple tokenizers

WordPiece tokenizer

BPE

The SentencePiece tokenizer

Word embeddings

FastText

From feature extraction to models

References

Part 3: Design and Development of ML Systems

Types of Machine Learning Systems – Feature-Based and Raw Data-Based (Deep Learning)

Why do we need different types of models?

Classical machine learning models

Convolutional neural networks and image processing

BERT and GPT models

Using language models in software systems

Summary

References

Training and Evaluating Classical Machine Learning Systems and Neural Networks

Training and testing processes

Training classical machine learning models

Understanding the training process

Random forest and opaque models

Training deep learning models

Misleading results – data leaking

Summary

References

Training and Evaluation of Advanced ML Algorithms – GPT and Autoencoders

From classical ML to GenAI

The theory behind advanced models – AEs and transformers

Training and evaluation of a RoBERTa model

Training and evaluation of an AE

Developing safety cages to prevent models from breaking the entire system

Summary

References

Designing Machine Learning Pipelines (MLOps) and Their Testing

What ML pipelines are

ML pipelines – how to use ML in the system in practice

Raw data-based pipelines

Feature-based pipelines

Testing of ML pipelines

Monitoring ML systems at runtime

Summary

References

Designing and Implementing Large-Scale, Robust ML Software

ML is not alone

The UI of an ML model

Data storage

Deploying an ML model for numerical data

Deploying a generative ML model for images

Deploying a code completion model as an extension

Summary

References

Part 4: Ethical Aspects of Data Management and ML System Development

Ethics in Data Acquisition and Management

Ethics in computer science and software engineering

Data is all around us, but can we really use it?

Ethics behind data from open source systems

Ethics behind data collected from humans

Contracts and legal obligations

References

Ethics in Machine Learning Systems

Bias and ML – is it possible to have an objective AI?

Measuring and monitoring for bias

Developing mechanisms to prevent ML bias from spreading throughout the system

Summary

References

Integrating ML Systems in Ecosystems

Ecosystems

Creating web services over ML models using Flask

Deploying ML models using Docker

Combining web services into ecosystems

Summary

References

Summary and Where to Go Next

To know where we’re going, we need to know where  we’ve been

Best practices

Current developments

My view on the future

Final remarks

References

Index

Why subscribe?

Other Books You May Enjoy

Packt is searching for authors like you

Share Your Thoughts

Download a free PDF copy of this book

Machine Learning Infrastructure and Best Practices for Software Engineers

By : Miroslaw Staron

Machine Learning Infrastructure and Best Practices for Software Engineers

By: Miroslaw Staron

Overview of this book

References

Confirmation

Buy this book with your credits?

Submit Your Feedback

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access