Book Image

Platform and Model Design for Responsible AI

By : Amita Kapoor, Sharmistha Chatterjee
Book Image

Platform and Model Design for Responsible AI

By: Amita Kapoor, Sharmistha Chatterjee

Overview of this book

AI algorithms are ubiquitous and used for tasks, from recruiting to deciding who will get a loan. With such widespread use of AI in the decision-making process, it’s necessary to build an explainable, responsible, transparent, and trustworthy AI-enabled system. With Platform and Model Design for Responsible AI, you’ll be able to make existing black box models transparent. You’ll be able to identify and eliminate bias in your models, deal with uncertainty arising from both data and model limitations, and provide a responsible AI solution. You’ll start by designing ethical models for traditional and deep learning ML models, as well as deploying them in a sustainable production setup. After that, you’ll learn how to set up data pipelines, validate datasets, and set up component microservices in a secure and private way in any cloud-agnostic framework. You’ll then build a fair and private ML model with proper constraints, tune the hyperparameters, and evaluate the model metrics. By the end of this book, you’ll know the best practices to comply with data privacy and ethics laws, in addition to the techniques needed for data anonymization. You’ll be able to develop models with explainability, store them in feature stores, and handle uncertainty in model predictions.
Table of Contents (21 chapters)
1
Part 1: Risk Assessment Machine Learning Frameworks in a Global Landscape
5
Part 2: Building Blocks and Patterns for a Next-Generation AI Ecosystem
9
Part 3: Design Patterns for Model Optimization and Life Cycle Management
14
Part 4: Implementing an Organization Strategy, Best Practices, and Use Cases

What this book covers

Chapter 1, Risks and Attacks on ML Models, presents a detailed overview of key terms related to different types of attacks possible on ML models, creating a basic understanding of how ML attacks are designed by attackers. In this chapter, you will get familiar with the attacks, both direct and indirect, that compromise the privacy of a system. In this context, this chapter highlights losses incurred by organizations due to the loss of sensitive information and how individuals remain vulnerable to losing confidential information into the hands of adversaries.

Chapter 2, The Emergence of Risk-Averse Methodologies and Frameworks, presents an overall detailed overview of risk assessment frameworks, tools, and methodologies that can be directly applied to evaluate model risk. In this chapter, you will get familiar with the tools included in data platforms and model design techniques that will help to reduce the risk at scale. The primary objective of this chapter is to create awareness of data anonymization and validation techniques, in addition to the introduction of different terms and measures related to privacy.

Chapter 3, Regulations and Policies Surrounding Trustworthy AI, introduces different laws being passed across nations to protect and prevent the loss of sensitive information of customers. You will get to know the formation of different ethics expert groups, government initiatives, and policies being drafted to ensure the ethics and compliance of all AI solutions.

Chapter 4, Privacy Management in Big Data and Model Design Pipelines, presents a detailed overview of different components associated with a big data system, which serves as a building block atop which we can effectively deploy AI models. This chapter brings into the picture how compliance-related issues can be handled at a component level in a microservice-based architecture so that there is no information leakage. In this chapter, you get familiar with different security principles needed in individual microservices, as well as security measures that need to be incorporated in the cloud when deploying ML models at scale.

Chapter 5, ML Pipeline, Model Evaluation, and Handling Uncertainty, introduces the AI/ML workflow. The chapter then delves into different ML algorithms used for classification, regression, generation, and reinforcement learning. The chapter also discusses issues related to the reliability and trustworthiness of these algorithms. We start by introducing the various components of an ML pipeline. The chapter then briefly explores the important AI/ML algorithms for the tasks of classification, regression, and clustering. Further, we discuss various types of uncertainties, their causes, and the techniques to quantify uncertainty.

Chapter 6, Hyperparameter Tuning, MLOPs, and AutoML, continues from the previous chapter and explains the need for continuous training in an ML pipeline. Building an ML model is an iterative process, and the presence of so many models, each with a large number of hyperparameters, complicates things for beginners. This chapter provides a glimpse into the present AutoML options for your ML workflow. It expands on the situations where no-code/low-code solutions are useful. It explores the solutions provided by major cloud providers in terms of ease, features, and model explainability. Additionally, the chapter also covers orchestration tools, such as Kubeflow and Vertex AI, to manage the continuous training and deployment of your ML models.

Chapter 7, Fairness Notions and Fair Data Generation, presents problems pertaining to unfair data collection for different types of data, ontologies, vocabularies, and so on, due to the lack of standardization. The primary objective of this chapter is to stress the importance of the quality of data, as biased datasets can introduce hidden biases in ML models. This chapter focuses on the guiding principles for better data collection, management, and stewardship that need to be practiced globally. You will further see how evaluation strategies initial steps can help to build unbiased datasets, enabling new AI analytics and digital transformation journeys for ML-based predictions.

Chapter 8, Fairness in Model Optimization, presents different optimization constraints and techniques that are essential to optimize and obtain fair ML models. The focus of this chapter is to enlighten you with different, new customized optimizers, unveiled by research, that can serve to build supervised, unsupervised, and semi-supervised fair ML models. The chapter, in a broader sense, prepares you with the foundational steps to create and define model constraints that can be used by different optimizers during the training process. You will also gain an understanding of how to evaluate such constraint-based models with proper metrics and the extra training overheads incurred during the optimization techniques, which will enable the models to design their own algorithms.

Chapter 9, Model Explainability, introduces you to different methods that can be used to unravel the mystery of black boxes in ML models. We will talk about the need to be able to explain a model prediction. This chapter covers various algorithms and techniques, such as SHAP and LIME, to add an explainability component to existing models. We will explore the libraries, such as DoWhy and CausalNex, to see the explainability features available to an end user. We will also delve into the explainability features provided by Vertex AI, SageMaker, and H2O.ai.

Chapter 10, Ethics and Model Governance, emphasizes the ethical governance processes that need to be established with models in production, for quick identification of all risks related to the development and deployment of a model. This chapter also covers best practices for monitoring all models, including those in an inventory. You will get more insights into the practical nuances of risks that emerge in different phases of a model life cycle and how these risks can be mitigated when models reside in the inventory. Here, you will also understand the different risk classification procedures and how they can help minimize the business loss resulting from low-performance models. Further, you will also get detailed insights into how to establish proper governance in data aggregation, iterative rounds of model training, and the hyperparameter tuning process.

Chapter 11, The Ethics of Model Adaptability, focuses on establishing ethical governance processes for models in production, with the aim of quickly detecting any signs of model failure or bias in output predictions. By reading this chapter, you will gain a deeper understanding of the practical details involved in monitoring the performance of models and contextual model predictions, by reviewing the data constantly and benchmarking against the past in order to draft proper actionable short-term and long-term plans. Further, you will also get a detailed understanding of the conditions leading to model retraining and the importance of having a perfectly calibrated model. This chapter also highlights the trade-offs associated with fairness and model calibration.

Chapter 12, Building Sustainable Enterprise-Grade AI Platforms, focuses on how organizational goals, initiatives, and support from leadership can enable us to build sustainable ethical AI platforms. The goal of this chapter is to stress the importance of organizations contextualizing and linking ethical AI principles to reflect the local values, human rights, social norms, and behaviors of the community in which the solutions operate. In this context, the chapter highlights the impact of large-scale AI solutions on the environment and the right procedures that need to be incorporated for model training and deployment, using federated learning. This chapter further delves into important concepts that strongly emphasize the need to stay socially responsible, as well as being able to design software, models, and platforms.

Chapter 13, Sustainable Model Life Cycle Management, Feature Stores, and Model Calibration, explores the best practices that need to be followed during the model development life cycle, which can lead to the creation of sustainable feature stores. In this chapter, we will highlight the importance of implementing privacy so that reusing stores and collaboration among teams are maximized, without compromising security and privacy aspects. This chapter further provides a deep dive into different model calibration techniques, which are essential in building scalable sustainable ML platforms. Here, you will also understand how to design adaptable feature stores and how best we can incorporate monitoring and governance in federated learning.

Chapter 14, Industry-Wide Use Cases, presents a detailed overview of the different use cases across various industries. The primary aim of this is to inform readers coming from different industry domains on how ethics and compliance can be integrated into their systems, in order to build a fair and equitable AI system and win the confidence and trust of end users. You will also get a chance to apply algorithms and tools studied in previous chapters to different business problems. Further, you will gain an understanding of how ethical design patterns can be reused across different industry domains.