Closely related to document classification is document separation. Traditionally, documents (which were scanned) were separated by the presence of blank pages, barcodes or some other identifier that the system could use to discern between one document and another. These identifiers are typically applied manually during what is called batch preparation.
Increasingly, more documents arrive already digitized to an organization, whether already scanned or born-digital. For documents that exist as individual files (e.g., a Word or PDF file), there is really no need to separate them.
Many cases exist where multiple documents are stored as a single file. For instance, a patient claim often has the claim form and supplemental documentation. Another example is a mortgage loan file that can have from 50 to 500 or more documents stored within a single PDF. In these cases, it is impractical for manual insertion of document separators so something else must be done. A rules-based method is often the favored approach because it is simple to understand and implement.
However, as with any rules-based system, there is an unfortunate tradeoff between comprehensiveness and cost with most organizations opting to minimize costs. This typically results in a lot of errors.