Book Image

Intelligent Document Capture with Ephesoft, Second Edition - Second Edition

Book Image

Intelligent Document Capture with Ephesoft, Second Edition - Second Edition

Overview of this book

Table of Contents (14 chapters)

Common regular expressions


The regular expressions used in Ephesoft are Java regular expressions. The reference documentation can be found on the Oracle website.

The following describes some more commonly used patterns:

  • Date: [0-9]{1,2}/[0-9]{1,2}/[0-9]{2,4}

    This pattern will look for 1 or 2 digits, [0-9]{1,2}, followed by a / and then 1 or 2 digits, [0-9]{1,2} followed by a / followed by 2 or 4 digits, [0-9]{2,4}. Examples of matching patterns are 1/31/12 and 03/17/1974.

  • Currency: [0-9]{1,3}?,?[0-9]{1,3}\.[0-9]{2}

    This pattern will look for 1 to 3 digits, [0-9]{1,3}, followed by a , and then, by 1 to 3 digits followed by a "." that is followed by 2 digits. The ? means a 0 or 1 instance of the pattern, so in this case, anything followed with the ? is optional. Examples of this pattern are 20.00, 50000.00, and 600,000.00.

  • Name with Letters Only: [a-zA-Z]{2,25}

    This pattern will look for any text that contains only 2 to 25 upper and lower case alpha characters.

    Note

    The following characters need to be escaped with a "\" – "[\^$.|?*+(){}."