Automating mortgage processing data capture and data categorization with IDP
The first stage of the IDP pipeline is the data capture stage. During this stage, all documents (such as URLA-1003, Form W-2, pay stubs, bank statements, credit card statements, mortgage notes, Form 1099, ID documents such as a passport and driver’s license, and any other documents) are collected and aggregated in a central secure data store on Amazon S3. You can define the right access control for the data on S3. This is the data capture stage of IDP.
At times, we know the document type, and can do further extraction. But most often, we do not have any specific way of identifying the documents; in that scenario, we need to classify documents before further extraction. We can use Textract to extract raw text from any type of document. Then, we can create sample label data for training a Comprehend classifier. Amazon Comprehend classification can help accurately categorize documents for mortgage application...