Peter is an IT manager at MBI Inc, a big corporation. His company has been in existence for long enough for a majority of the contractual finance documents, standard operating procedures, and supply chain documents to be paper-based. He is tasked with this humongous responsibility of making his company go paperless.
This means that he is responsible for eliminating the hassle and cost of managing paper archives. With the imaging knowledge we have gathered so far (and we will learn more) in this chapter, let's see if we can help Peter.
If you carefully analyze, Peter needs to achieve two important tasks:
Scan the papers and store them in an electronic format as images
Generate text files from these documents so that they can be easily indexed
For this exercise, let's start by installing the required modules. We will need the following modules:
scikit-image
(http://scikit-image.org/)pyimagesearch
(http://www.pyimagesearch.com/)tessaract
andpytesseract...