Intelligent Document Capture with Ephesoft, Second Edition

In general, document capture refers to the process of scanning paper documents using scanners or cameras, and transforming these documents into an electronic file such as a PDF, TIFF, and so on. Through a type of document capture software, these electronic files are assessed through character or pattern recognition (that is, OCR, ICR, and OMR) and converted into meaningful data or information, also called metadata. The goal of document capture systems is to classify incoming documents into categories and extract metadata by automating processes that humans normally do. By automating this process, organizations can classify documents and route the incoming documents to repositories and workflow systems with all the metadata more efficiently; thus, document capture systems help reduce errors, allow documents to be handled faster, and organizations can scale their business using software rather than labor.

Over the years, the document capture industry has evolved in many ways. Paper documents are still a big portion of the document formats that organizations receive. However, as organizations started to exchange documents electronically, the document capture systems had to evolve to be able to process the documents within e-mail attachments, sent via FTP or by using APIs. Of course, smartphone adoption also influenced the document capture world. As consumers started to produce electronic documents more using their phones, such as check deposits or expense reports, capture using mobile devices became essential.

The capabilities of the capture software has also become more intelligent over the years. Early systems automated the capture process by using barcodes for classifying documents and extracting metadata from predefined areas of structured documents, called forms. Later on, technologies such as Ephesoft classified the documents using several techniques such as document layout or words and phrases that can extract the metadata without defining where the metadata might be located on the document. We call these capture systems Intelligent Document Capture systems and they can help organizations automate document capture systems not only for structured documents but also for unstructured documents such as e-mails or documents that do not have any known format. Intelligent Document Capture systems are also easier to deploy and maintain compared to older systems, which provides faster ROI and adoption.

The latest trend in document capture systems is the use of web-based APIs. In the beginning, document capture systems were used only to ingest documents to repositories and workflow systems but with the popularity of APIs and the cloud, a new use case has opened. The newest versions of document capture systems now allow organizations to enable other business applications with document capture functionalities, where the cloud, private or public, based APIs provide document capture services rather than the applications. This makes organizations more efficient when it comes to maintaining, upgrading, and integrating multiple business applications.

If we try to imagine where document capture will be 5 to 10 years from now, it is anyone's guess. However, looking at the emerging technologies and today's advancements, we may be able to predict the future. In the future, we will see machine learning algorithms doing what capture administrators do today, so that any user can simply show the next generation document capture system what he/she wants within that document. Then, the computer should be able to understand what to do the next time without any human intervention. What this means is, if it takes one administrator to configure the system today and one IT professional to set up the servers, in the future there will be no need for either resource. The users will simply request a capture service using a browser or mobile device, the service will learn the human behavior by simply observing the user and then will perform the same task from then on. The projects on which we used to spend months on implementing are now measured in weeks. In the future, they will be measured in hours, if not minutes and seconds.

What this book covers

Chapter 1, A Quick Tour of Ephesoft, takes you on a walkthrough of Ephesoft's user interface. We will look at the administrative and the operator functionality of Ephesoft.

Chapter 2, Creating a Batch Class, explains how to set up Ephesoft. We will see how to create a new batch class and configure it for classification and extraction.

Chapter 3, Core Ephesoft Features, expands on the features introduced in Chapter 2, Creating a Batch Class. We will learn more about classification and indexing techniques. We will also learn about web scanning.

Chapter 4, Ephesoft's Advanced Features, explains advanced Ephesoft features. This includes how to set up Microsoft Active Directory, scripting, and utilizing web services.

Chapter 5, Tips, covers the productivity-enhancing tips.

Appendix, References, includes some reference material.

What you need for this book

Ephesoft Enterprise 4.0+ running on a Windows computer.

Who this book is for

This book is intended for information technology professionals interested in installing and configuring Ephesoft for their organization, but it is a valuable resource for anyone interested in learning about document capture in general.

Conventions

In this book, you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles, and an explanation of their meaning.

Code words in text are shown as follows: "You can use generic variables, which are EphesoftBatchID and EphesoftDOCID."

A block of code is set as follows:

import com.ephesoft.dcma.da.id.BatchInstanceID;
public interface SamplePluginService {
  void sampleMethod(BatchInstanceID batchInstanceID,
      final String pluginWorkflow) throws Exception;
}

New terms and important words are shown in bold. Words that you see on the screen, in menus or dialog boxes for example, appear in the text like this: "Administrators can use the Up and Down buttons to reorder the plugins or the Remove button to remove plugins from the module."

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.

To send us general feedback, simply e-mail <[email protected]>, and mention the book's title in the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

Downloading the color images of this book

We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from https://www.packtpub.com/sites/default/files/downloads/8582EN_ColoredImages.pdf.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at <[email protected]> with a link to the suspected pirated material.

We appreciate your help in protecting our authors and our ability to bring you valuable content.

Questions

If you have a problem with any aspect of this book, you can contact us at <[email protected]>, and we will do our best to address the problem.

Intelligent Document Capture with Ephesoft, Second Edition - Second Edition

Intelligent Document Capture with Ephesoft, Second Edition - Second Edition

Overview of this book

Related Content you might be interested in

Current Title:

Intelligent Document Capture with Ephesoft, Second Edition - Second Edition

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Note

Tip

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions