Book Image

Intelligent Document Processing with AWS AI/ML

By : Sonali Sahu
Book Image

Intelligent Document Processing with AWS AI/ML

By: Sonali Sahu

Overview of this book

With the volume of data growing exponentially in this digital era, it has become paramount for professionals to process this data in an accelerated and cost-effective manner to get value out of it. Data that organizations receive is usually in raw document format, and being able to process these documents is critical to meeting growing business needs. This book is a comprehensive guide to helping you get to grips with AI/ML fundamentals and their application in document processing use cases. You’ll begin by understanding the challenges faced in legacy document processing and discover how you can build end-to-end document processing pipelines with AWS AI services. As you advance, you'll get hands-on experience with popular Python libraries to process and extract insights from documents. This book starts with the basics, taking you through real industry use cases for document processing to deliver value-based care in the healthcare industry and accelerate loan application processing in the financial industry. Throughout the chapters, you'll find out how to apply your skillset to solve practical problems. By the end of this AWS book, you’ll have mastered the fundamentals of document processing with machine learning through practical implementation.
Table of Contents (16 chapters)
1
Part 1: Accurate Extraction of Documents and Categorization
6
Part 2: Enrichment of Data and Post-Processing of Data
10
Part 3: Intelligent Document Processing in Industry Use Cases

Understanding the AWS ML and AI stack

Just five decades ago, ML was still a thing of science fiction. But today, it is proven to be an integral part of our everyday lives. It helps us drive our cars, recommends personalized shopping experiences, and helps us utilize voice-enabled technologies such as Alexa. The early days of AI and ML began with simple calculators or chessboard games but by the 20th century, this has evolved into diagnosing cancer and more. The initial theory of ML was in research and labs and now it has moved from labs to real lives applications across industries. This is a change in the adoption of AI and ML.

 Figure 1.1 – AI and ML

Figure 1.1 – AI and ML

What is AI? AI is a wide range of computer science branches related to building smart machines. And ML is a subset or application of AI, as shown in Figure 1.1. The goal of ML is to let the machine learn automatically without any programming or human assistance. We want the machine to learn from its own experience and provide results. You gather data and the model learns and corrects itself based on this data. One of the famous historical achievements of AI or ML is Alan Turing’s paper and the subsequent development of the Turing Test in the 1950s. This established the fundamental goal and vision for AI. This focused on one main thing – can machines learn like humans? After 2 years, Arthur Samuels, another pioneer in the computer science and gaming industry, wrote the very first computer learning program for playing the game checkers. It was programmed to learn from the moves that allowed it to win and then program itself to play the game. With some of the recent AI and ML accomplishments, in the year 2015, AWS launched its own ML platform to make its models and ML infrastructure more accessible.

Now, we see AI and ML in our everyday usage. If you have used any e-commerce or online media or entertainment platforms, you must be familiar with receiving personalized recommendations or using conversational chatbots and virtual assistance with AI services. These personalized recommendations and experiences drive user engagement. Similarly, any helpdesk calls at contact centers can be automated with AI, driven to reduce the burden on human beings with reduced costs. Moreover, AI can be used in automatic document processing for accurate extraction and analysis and to instantly derive insights from it, as in loan processing or claims processing.

Now, we see a wide presence of ML and AI in our everyday usage and industries are busy building newer models to learn better and more quickly to give accurate predictions and accelerate business value. But the main question is – can we share the experience and knowledge that we learned when building models? Can a builder re-use an already trained model for its own business without spending time and effort to train another model? So, can we share our experience and knowledge and ML models for any builder to use and focus on their business needs?

The answer is yes, and for that reason, AWS has divided its ML stack into three broad categories. Let’s discuss the three individual AI/ML stacks in detail and their core goals in solving user requirements in the following figure:

Figure 1.2 – The ML framework and infrastructure at the bottom of the AWS stack

Figure 1.2 – The ML framework and infrastructure at the bottom of the AWS stack

At the bottom of the AWS AI or ML stack, we see services and features targeted at expert ML practitioners who have the expertise and are comfortable working with ML frameworks, algorithms, and deploying their ML infrastructure. Some of the AWS ML frameworks and infrastructure are shown in Figure 1.2. AWS offers users their framework of choice, thus supporting ML frameworks such as PyTorch, Apache MxNet, and TensorFlow to run optimally on the AWS platform. The bottom layer also stacks CPU and GPU instances. Decades ago, obtaining GPU resources to accelerate your ML workload was a wild dream for general ML builders. You might have to reach out to a supercomputing center to get ahold of GPU resources. But today, you can access GPUs at your fingertips with AWS. AWS gives users the option to customize and select instances with customized memory, vCPU, architectures, and more. AWS added Trainium, a second ML chip optimized for deep learning training.

Not only that, but to democratize the ML infrastructure, AWS offers Inferentia to drive high-performance deep learning inference on the cloud at a fraction of the cost:

Figure 1.3 – ML services in the middle of the AWS stack

Figure 1.3 – ML services in the middle of the AWS stack

The middle layer in the AI or ML stack is more targeted toward an ML builder who wants to build, train, and deploy their own ML models. Some of the AWS offerings for ML services are shown in Figure 1.3. This layer makes ML more accessible and expansive. Amazon SageMaker helps data scientists and developers prepare, build, train, and deploy high-quality ML models quickly by bringing together a broad set of capabilities purpose-built for ML. Amazon SageMaker offers JumpStart to help you quickly get started with a solution by automatically extracting, processing, and analyzing documents for accelerated and accurate business outcomes. It offers an integrated Jupyter notebook for authoring your model with pre-built optimized algorithms. But at the same time, it gives options to ML users to bring their own algorithms and frameworks. It offers a managed, scalable, and secure training and deployment platform for your ML process. To learn more about Amazon SageMaker, you can also refer to the book The Machine Learning Solutions Architect written by David Ping and published by Packt.

You can find this book here: https://www.packtpub.com/product/the-machine-learning-solutions-architect-handbook/9781801072168.

Figure 1.4 – AI services at the top of the AWS stack

Figure 1.4 – AI services at the top of the AWS stack

AWS designed the top layer to put ML in the hands of every single developer. These are AI services. AWS drew on its experience with Amazon.com and its ML services to offer highly accurate, API-driven AI services. You do not need to be an ML expert to call on the pre-trained models leveraging APIs. Rather, you can use AI services to enhance your customer experience, improve productivity, and get a faster time to market with ready-made ML models. At the core of the AI services, we have Vision services, with Amazon Rekognition and AWS Panorama. For Speech, we have services such as Amazon Polly, Amazon Transcribe, and Call Analytics; and for chatbots, Amazon Lex. You can leverage these speech and bot services for use cases such as call center modernization. For leveraging the experience of Amazon.com on a recommendation system, it offers Amazon Personalize. In this book, we will dive deep into the document processing use cases with its text and document services such as Amazon Textract and Amazon Comprehend. To help the customer with industry-specific use cases, AWS AI services are also categorized in terms of industrial use, with AWS AI services such as Monitron and Lookout, and healthcare technologies, with AI services such as Amazon Comprehend Medical, HealthLake, and Transcribe Medical. In Figure 1.4 here, we are showing how AI services can be aligned to specific industry use cases. But in this book, we will dive deeper into IDP use cases in particular.

Some of the main benefits of AWS AI services are that the models are fully managed and AWS takes care of the undifferentiated heavy lifting in building, maintaining, patching, or upgrading servers or hardware required for the model(s) to run. You can customize and interact with the AI models and perform predictions via API calls or directly from the AWS console. AWS AI services enable you to have performant and scalable solutions with serverless technologies, which can be called using these AI service APIs. You can just call APIs using a serverless architecture that scales automatically as the document processing demand grows or shrinks. This is highly performant, with low latency and timely delivery of your business use case:

Figure 1.5 – Accessing AWS AI services with an API call

Figure 1.5 – Accessing AWS AI services with an API call

With AWS AI or ML offerings, we have multiple technologies available to implement the same use case. There are trade-offs when using AI services that are API-driven versus ML services. We will dive deeper into the comparison of AI and ML models for IDP in Chapter 3, Accurate Document Extraction with Amazon Textract, under the Introduction to Textract section.

Alright, it’s time to get started with an overview of IDP. Now that we understand how AWS cloud infrastructure and services will help us accelerate our AI or ML workload, let’s dive into the IDP pipeline and its applications across industries.