The Role of OCR in Intelligent Document Processing

Uncovering OCR

Is OCR the hard part? Is text parsing the same as intelligent document processing? To answer these questions, let’s start with a cognitive experiment.

Aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it deosn’t mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist and lsat ltteer be at the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe.

Unfortunately, this cognitive experiment is not entirely accurate. There is no record of Cambridge University staff doing this research. In addition, ample examples of preserving the first and last letters resulting in words that are unreadable exist. However, the above cognitive experiment does provide a good illustration of the difference between OCR and cognitive capabilities. Simply stated, while OCR certainly has machine learning at its core, its job is to simply transcribe the text in an image into machine-readable formats. If you were to run OCR on an image of the above, you get the following:

Where you able to read the full paragraph?

Yes, it was easy to read!
Yes, but I struggled with some words.
No, I might be a robot.

How do we extract good data from the cognitive experiment on the previous page? First, there isn’t a cognitive solution that can take 100% misspellings and make instantaneous corrections like the human brain can. However, in the domain of Intelligent Capture, most of the effort is placed on what to do WITH the information contained within the document. You might say, “hey, we can run a spell-checker to correct the words.” Nice idea, and this is typically Step 1. And yet, this won’t solve all problems as many words just aren’t contained within standard vocabularies.

Intelligent Document Processing: Interpretation Methods

Interpretation methods using fancy names like n-gram can determine probabilistically the next word or words in a sequence. These techniques are especially useful with complex multi-word data. Using this and other techniques, intelligent document processing deals with the presence of various specialized words and phrases often contained within a given structured form field or unstructured document where use of specialized vocabularies or general dictionaries fall short.

Further distancing intelligent document processing from OCR software is that intelligent document processing also attempts to reduce or obviate the need for OCR, using it only when necessary. Just like a human would, these systems learn where needed information is located and what clues help to find it. Instead of reading an entire document, intelligent document processing focuses directly on the information.

Focusing on What Matters Most

Instead of performing OCR or parsing the entire document text, modern intelligent document processing systems skip over irrelevant information and focus on specific sections. This helps to avoid unnecessary OCR or full-text parsing that can slow-down the entire process. If documents are born-digital, then OCR can be skipped altogether to immediately interpret the document and extract the needed data.

By now, it should be clear that while OCR is an important step within intelligent document processing for scanned documents, it delivers only text, not interpretation. If the documents are digital, OCR is not required at all. To move to a capability where document-based information can be used within an automated transaction, several levels of capabilities are required in order to go from text to real structured data.