Book Image

Intelligent Document Processing with AWS AI/ML

By : Sonali Sahu
Book Image

Intelligent Document Processing with AWS AI/ML

By: Sonali Sahu

Overview of this book

With the volume of data growing exponentially in this digital era, it has become paramount for professionals to process this data in an accelerated and cost-effective manner to get value out of it. Data that organizations receive is usually in raw document format, and being able to process these documents is critical to meeting growing business needs. This book is a comprehensive guide to helping you get to grips with AI/ML fundamentals and their application in document processing use cases. You’ll begin by understanding the challenges faced in legacy document processing and discover how you can build end-to-end document processing pipelines with AWS AI services. As you advance, you'll get hands-on experience with popular Python libraries to process and extract insights from documents. This book starts with the basics, taking you through real industry use cases for document processing to deliver value-based care in the healthcare industry and accelerate loan application processing in the financial industry. Throughout the chapters, you'll find out how to apply your skillset to solve practical problems. By the end of this AWS book, you’ll have mastered the fundamentals of document processing with machine learning through practical implementation.
Table of Contents (16 chapters)
1
Part 1: Accurate Extraction of Documents and Categorization
6
Part 2: Enrichment of Data and Post-Processing of Data
10
Part 3: Intelligent Document Processing in Industry Use Cases

What this book covers

Chapter 1, Intelligent Document Processing with AWS AI and ML, will explain how AWS wants to make ML accessible to everyone. For that reason, it has defined a three-layer AWS ML stack. AWS AI services can be called and leveraged by calling an API. First, the reader will learn about the AWS AI/ML stack. Then, we will define document processing, the challenges in document processing, and how AWS can help. We will also discuss common IDP use cases across industries. Finally, we will show the reader the stages of the IDP pipeline.

Chapter 2, Document Capture and Categorization, will detail how to collect data in a scalable, highly available data store. We will look into some of the security features for our data capture stage. Then, we will look into the accurate classification of documents. Readers will learn about the document splitter and how to use it on a code sample. Readers will learn to train their custom classifiers to accurately classify their document types.

Chapter 3, Accurate Document Extraction with Amazon Textract, will dive into key use cases for extracting data accurately from structured, unstructured, and semi-structured types of documents. Readers will learn about specialized documents, such as invoices, receipts, driver’s licenses, and passports, and how we can leverage the AWS AI service Amazon Textract for accurate extraction.

Chapter 4, Accurate Extraction with Amazon Comprehend, will explain document extraction with Amazon Comprehend. Here, we will learn about the extraction features for Entities and Custom Entities in Amazon Comprehend. Readers will learn how to train their own custom Comprehend model with Amazon Comprehend. Finally, the reader will learn about the key phrases to extract for accurate document tagging and categorization.

Chapter 5, Document Enrichment in Intelligent Document Processing, will explore the document enrichment stage of IDP. Readers will learn about document enrichment and the redaction of sensitive information with PII detection in Amazon Comprehend. They will learn about extracting health insights from Amazon Comprehend Medical and how we can augment document processing with health insights and ontology linking.

Chapter 6, Review and Verification of Intelligent Document Processing, will elaborate on the post-processing stage, with completeness checks and access control. Readers will learn about the document completeness check during the post-processing of a document. They will also learn about PII detection in Comprehend and PHI detection in Comprehend Medical, with APIs for sensitive data redaction, and setting policies for right access control. Finally, the reader will learn about accuracy checks with human review.

Chapter 7, Accurate Extraction and Health Insights with Amazon HealthLake, will start with a brief introduction to healthcare interoperability with FHIR and explain the requirement to store documents in a healthcare datastore, which can be done with Amazon HealthLake. Readers will learn about the features of Amazon HealthLake and how to extend IDP to process and store documents in the health datastore.

Chapter 8, IDP Healthcare Industry Use Cases, will explore healthcare prior authorization and healthcare claims processing as IDP use cases. Readers will learn about the prior authorization process and how to build an IDP pipeline for prior authorization to accelerate the pre-certification process. Finally, the reader will learn about the claims adjudication process and build an end-to-end IDP pipeline for it.

Chapter 9, Intelligent Document Processing – Insurance Industry, will look into two use cases in the insurance industry – processing benefit registration and claims adjudication – as IDP solutions. Readers will learn how to use the various stages in the IDP pipeline to build and automate these use cases. Finally, we will accurately extract data from multiple document types and layouts for the verification of the claims form.

Chapter 10, Intelligent Document Processing – Mortgage Processing, will analyze lending document processing as an IDP solution. Readers will learn about mortgage and lending document processing with the IDP pipeline. Finally, we will accurately extract data from multiple document types and layouts for the verification of mortgage documents.