Rules-based automation might look for instances of “W-2” or “Total Income” for the W-2 document while identifying presence of the words like “account balance” along with “account number” and “statement” might establish that a document is a bank statement.
As you might suspect, the power of rules-based classification is directly tied to the amount of time spent by a subject matter expert (SME) reviewing available data, identifying key characteristics of each and then encoding the rules. For some needs, where there are only a few document types, a rules-based approach might make sense because it is typically simpler to implement. In a case where there are a lot of document types, like 30 or more, and where characteristics of each might overlap, a rules-based approach will fall short.
In a case where 50 document types are involved, and where there can be different versions of any particular document type, it is very probable that the rules identified for one type will overlap rules for another. It really isn’t practical (mostly due to the time required, but also because of the ongoing maintenance) to analyze each type and version and then verify that there is no overlap and to test and tune each one.