Book Image

Intelligent Document Processing with AWS AI/ML

By : Sonali Sahu
Book Image

Intelligent Document Processing with AWS AI/ML

By: Sonali Sahu

Overview of this book

With the volume of data growing exponentially in this digital era, it has become paramount for professionals to process this data in an accelerated and cost-effective manner to get value out of it. Data that organizations receive is usually in raw document format, and being able to process these documents is critical to meeting growing business needs. This book is a comprehensive guide to helping you get to grips with AI/ML fundamentals and their application in document processing use cases. You’ll begin by understanding the challenges faced in legacy document processing and discover how you can build end-to-end document processing pipelines with AWS AI services. As you advance, you'll get hands-on experience with popular Python libraries to process and extract insights from documents. This book starts with the basics, taking you through real industry use cases for document processing to deliver value-based care in the healthcare industry and accelerate loan application processing in the financial industry. Throughout the chapters, you'll find out how to apply your skillset to solve practical problems. By the end of this AWS book, you’ll have mastered the fundamentals of document processing with machine learning through practical implementation.
Table of Contents (16 chapters)
Part 1: Accurate Extraction of Documents and Categorization
Part 2: Enrichment of Data and Post-Processing of Data
Part 3: Intelligent Document Processing in Industry Use Cases

Understanding common document processing use cases across industries

We started with a simple claims processing use case in the healthcare industry. But document processing challenges occur across multiple use cases and industries. For example, with a single patient generating nearly 80 megabytes of data each year in imaging and Electronic Medical Record (EMR) data, according to 2017 estimates, RBC Capital Markets projects that by 2025, the compound annual growth rate of data for healthcare will reach 36%. When a patient visits a physician, an immense amount of data is generated. Equally, when you speak with customers, they say they have petabytes of data in their archive, which is sitting there in a drive or tape drive without being processed further for legal or regulatory reasons, and most of it is unstructured data. For example, some healthcare providers in the US store medical history records for at least 7 years as per the regulation. If we can analyze a patient’s historical data, we can build a predictive model for any chronic disease. This data is a gold mine, but because of the lack of an efficient, cost-effective mechanism for document processing, it sits there unused. Most of this data is currently stored as archived data and retired after the 7-year period is over. Can we use this data to derive insights for better healthcare outcomes?

Similarly, in the financial industry, there is a need for document processing – for example, when processing mortgage documents. Anyone who has bought a new home or refinanced their home must know the number of documents and different document types that we deal with for mortgage processing. McKinsey’s report emphasizes that mortgage providers should get things right the first time to reduce any delay in processing. To address the timely verification of these documents, we need to empower loan officers with the right tools, automation, and insights. The immense volume and format of documents and the need to derive insights from them require automation with the right indexing, categorization, and extraction, with human reviews as needed to detect anomalies and get the mortgage documents right the first time for timely processing.

It is not only the healthcare or financial industries that require document processing but also industries across verticals and use cases such as legal documents and contracts, insurance, ID handling, and enrollments with the use of advanced technologies such as AI and ML, wants to automate document processing with advanced AI and ML technologies. Intelligent Document Processing uses AI-powered automation and ML to classify, extract, transform, and enrich our documents for consumption. Before discussing advanced technologies and solutions, it is always good to start with the basics. So, let’s first set the foundation of AI and ML.