The Essence of Intelligent Document Processing

Solving Problems With Machine Learning

A hammer in a drawer is just a combination of materials. However, when held correctly and used with force against the head of a nail, it becomes extraordinarily useful. It’s the same with machine learning. No vendor, except for purveyors of machine learning toolkits, offer machine learning without first applying it to a specific problem.

By now you’re probably realizing that there is a theme to all of this “cognitive stuff” — the use of machine learning applied to specific tasks. The reality is that machine learning itself is a tool just like any other type of tool. Without the proper application, it is meaningless.

Intelligent Capture and Common Applications

One of the most popular applications for intelligent capture is with automating the identification and sorting of documents. There are a lot of processes that involve many different documents, sometimes even several hundred, that can be submitted without any organization. These include claims adjudication, loan origination and commercial logistics.

If organizations are not manually processing these documents (and most are), they are undoubtedly using a rules-based process that attempts to identify incoming documents based upon specific, identified attributes. For instance, with mortgage documentation, a rules-based approach attempts to mimic a manual process. But instead of looking at the overall document including the graphical orientation, specific keywords might be used to discern between a document establishing proof of income from a document providing information on assets.

Even though a person might easily distinguish between a W-2 and a bank statement, the rules-based approach relies upon the presence (or absence) of specific words or other textual data.

Rules-based automation might look for instances of “W-2” or “Total Income” for the W-2 document while identifying presence of the words like “account balance” along with “account number” and “statement” might establish that a document is a bank statement.

As you might suspect, the power of rules-based classification is directly tied to the amount of time spent by a subject matter expert (SME) reviewing available data, identifying key characteristics of each and then encoding the rules. For some needs, where there are only a few document types, a rules-based approach might make sense because it is typically simpler to implement. In a case where there are a lot of document types, like 30 or more, and where characteristics of each might overlap, a rules-based approach will fall short.

In a case where 50 document types are involved, and where there can be different versions of any particular document type, it is very probable that the rules identified for one type will overlap rules for another. It really isn’t practical (mostly due to the time required, but also because of the ongoing maintenance) to analyze each type and version and then verify that there is no overlap and to test and tune each one.

Leveraging the Power of Machine Learning

One of the strongest benefits of machine learning-based solutions, or as the industry is increasingly using, cognitive systems, is the ability to analyze a very large size of sample data to identify and record key attributes (often called “features”) of each document type that are compared against other document attributes to arrive at the most reliable set of features with which to reliably apply automation.

Machine learning systems can detect even the slightest variances that might go unnoticed by SMEs. They can record a larger number and frequency of these key features to use the most reliable inferences to produce high quality results. This ability obviously reduces the associated costs, complexity and risk associated with manual analysis and configuration of rules, including upkeep. Cognitive classification turns potentially several hundred hours of effort into a “compute-time exercise”, resulting in better, more reliable performance at a much lower level of effort.

Should You Buy or Build Intelligent Document Processing?

What it takes to build a intelligent document processing solution from scratch is an important discussion because even after reviewing all of the various moving parts, some IT groups within organizations often prefer to develop their own solutions. They look to develop tailored solutions that address their organization’s needs vs. implementing an off-the-shelf product.

With the availability of cloud-based capabilities that provide elements like OCR, classification and some level of data extraction — all offered as discrete capabilities — many organizations are seduced into believing the process of designing their own solution will be simple. There are many aspects to the build/buy decision process that won’t be covered here because they are too generic. Let’s focus instead on the hidden elements of creating your own solution.

Hidden Elements to Custom Solutions

There are two primary hidden costs associated with developing a custom capture solution: staff skills and OCR performance. The staff skills issue might seem like a traditional development skills acquisition problem.

When it comes to creating software where the primary objective is high levels of comprehensive data accuracy, knowledge of software development is a necessary prerequisite, but is only a small factor. It may be easy to develop software that uses third-party capabilities such as Google Document Understanding or Amazon’s Textract to perform certain operations.

Data Science and Machine Learning

Most decisions to go with a customized solution are based on the need to handle specific problems where no ready-made solutions exist.

In this case, the most offerings on the market force a trade-of between out-of-the-box capabilities and a custom project to deliver specific capabilities. There is no in-between. This results in development projects, which start out seemingly small, turning into large custom projects that often cost more than commercial software alternatives. The skills required to bring these complex projects to fruition require expertise in data science and an in-depth understanding of machine learning algorithms including when to choose one technique over another. Commercial alternatives offer flexibility with configuring the systems to meet very specific needs without the same significant investment in data science and machine learning skills.

The Challenge of Reliable OCR

Unbeknownst to most — even those with solid technical backgrounds — are the relative peculiarities associated with OCR toolkits and their cloud-based brethren. OCR is largely designed and used to convert image-based text into machine readable form. In order to perform that function, OCR software has been tuned at the character and word level to achieve high levels of reliability. The problem arises when an organization needs to find specific data within documents and output it in a structured format.

There is a lot to consider. Start with the ability to reliably locate data. Many programmers might assume that it is simply a matter of applying regular expressions to the text. If you need a date, simply look for a format of XX/XX/XXX. However, what if there are many different date formats? Going down these obvious routes neglects a lot of key contextual data that significantly aids with this task such as spatial proximity of targeted data to other data, fonts of needed data and many other typically visual aspects.

And then, there are the issues with the data output, especially with data called, “confidence scores.” Confidence Scores for OCR are different from those in intelligent document processing solutions. OCR provides confidence scores at the character and word level while intelligent document processing solutions provide confidence scores at the data field level. Analyzing scores at the field level is essential to successful intelligent document processing projects. There are even intelligent document processing solutions that cannot overcome the OCR confidence score problem when it comes to data field level outputs. This results in the need to manually verify every single data output.

Where It Makes Sense to Build a Solution or Purchase One

There are many use cases where it makes sense to build a solution vs. purchase one. While many different tool kits, SDKs and Web Services focused on OCR, classification and handwriting recognition are available to developers, the reality is that a intelligent document processing solution is more than the sum of its parts. Rather a lot goes into creating a solution that converts document-based information into structured data in a reliable, accurate manner.

Most of these services perform better when they can be applied as out-of-the-box capabilities that do not require significant data science skills. This means that where organizations require solutions to specific problems of their organization, an off-the-shelf intelligent document processing software solution is almost always the best option.

Summary and Final Thoughts

In summary, successful intelligent document processing requires a combination of multiple technologies such as OCR, ICR, Machine Learning, Natural Language Processing, and more. When looking to fully automate document-based tasks, most organizations need all of these features, configured properly, to achieve significant cost savings. This eBook covered most of the technologies embedded in a modern intelligent document processing system, but there are new advancements on the horizon. Vendors are constantly innovating their products with the latest artificial intelligence algorithms, so we encourage interested organizations to monitor these vendors closely in the years to come.