Understanding document extraction – the IDP extraction stage with Amazon Comprehend
In the preceding example for Amazon Comprehend’s extraction, the input required was of the text type.
How can we process documents and extract insights with Amazon Comprehend? For this solution, we will use Amazon Textract in conjunction with Amazon Comprehend for accurate data extraction.
See Figure 4.9 for an architecture that would serve as the extraction part of the IDP pipeline:
Figure 4.9 – Document extraction stage with Amazon Comprehend
We have walked through the key features of Amazon Comprehend on the AWS Management Console. But we can use Amazon Comprehend APIs to automate extraction programmatically. Now let’s walk through some sample code for extracting pre-trained entities from any type of document:
- Get the
boto3
client for Amazon Textract and Amazon Comprehend:s3=boto3.client('s3') textract = boto3.client(&apos...