The future of OCR Technology
Richard Develyn, CloudTrade CTO, looks at how although OCR may capture some of the data needed, it cannot provide the understanding required to know what to do with that data or what the data means
The technology of OCR has been around for years and often pops up when looking at data capture solutions. It is often the ‘go-to’ for companies looking at document capture automation.
Richard Develyn, CloudTrade CTO, looks at how although OCR may capture some of the data needed, it cannot provide the understanding required to know what to do with that data or what the data means. When it comes to the future of data capture and enabling automation, we need to look at data perception and understanding…
Richard is often asked to explain the difference between the service that we provide at CloudTrade and those services which are sold under the banner of “Optical Character Recognition” (OCR).
There is almost a straight answer to this, which is that OCR deals with what we might call “human perception” whereas CloudTrade is more about “human understanding”.
However as Richard explains he suggests “almost” because the waters get muddied on a couple of counts and will come to these later; and defines exactly what is meant by “perception” and “understanding”.
Data perception and data understanding?
Perception is all about recognition in its most basic form. It’s the bit in our brains which translates swirly lines and dots and circles into meaningful letters in the English language. It’s also the bit that has to struggle with differentiating between “i” and “j” or “b” and “h” so that we don’t end up wishing people “bappy hjrthdays” or catching a “fjshes” on a “fjshjng book”.
Understanding, however, is all about meaning. It’s the bit that comes in after perception has done its job (assuming that it gets it right!) and figures out, say, that the word “fishing” in “fishing for compliments” has nothing to do with the word “fishing” when you’re fishing in the sea.
Where the difference in perception and understanding starts to get muddied is that both the providers of OCR based solutions and we, ourselves, at CloudTrade, offer services which are based on a combination of both of these technologies.
You can’t have one without the other
You can’t, after all, have understanding without perception (unless you’re some sort of yogi floating on a mat over the Himalayas), or perception without understanding (imagine trying to find your way around the Tokyo underground system when you don’t speak Japanese).
CloudTrade and OCR-based solutions need to use both of these elements because providing this service means not only extracting the right numbers and letters from those documents that are sent to CloudTrade but also understanding them well enough to explain that, for example, “quantity 1” in an order line next to “car mats” is probably referring to a pack of 4 whereas the same phrase next to “Lamborghini Veneno Roadster” is unlikely to be referring to a pack of 4 of them at all.
Traditionally, OCR-based solutions have focussed on the perception side of the problem because that is where they have invested the bulk of their R&D, leaving the understanding part to be provided mostly by humans.
The value is in the understanding
CloudTrade, on the other hand, has invested all of its R&D efforts on understanding, succeeding in bypassing the perception part completely by focusing on “data” documents such as “data” PDFs (where, for example, the letter “s” is unambiguously stored as the letter “s” rather than as a set of drawing instructions resulting in something which could look like the letter “s” to the human eye).
Data PDFs do not need OCR and can therefore be thought of as producing a “perception” result which is 100% accurate. 100% perception is the key enabler for the process of understanding, as it allows a natural language analysis to take place with high levels of sophistication as there is no fear that all of the logical steps taking place within it will be broken by some stray spanner in the works which changes the word “battery” to a “hattery” or omits a very important decimal point in the phrase “don’t exceed the recommended dose of 1.234 ml every 24 hours”.
Providing the fuel for automation
Sophisticated systems of understanding remove the need for human operators and allow services to operate in a fully automated manner. At the time of writing, CloudTrade is processing ten million documents a year in this fashion. As soon as errors in perception are introduced, such as by using OCR, failures start to occur in the grammatical rules which underpin the process of understanding, and more and more human intervention is needed resulting in less and less automation.
Alternatively, OCR solutions operate in this field because they embrace the human element of document processing. The advantage is that they are not limited to only processing data PDFs. Their disadvantage is that they cannot fully automate.
To assume is to…
The second way in which the difference between perception and understanding has been muddled is in the technology behind OCR, which has now made inroads into the world of understanding.
From a recent seminal paper by Douglas Hofstadter on OCR and AI called “on seeing A’s and seeing As” we quote:
“A tacit assumption is thus that the components of sentences–individual words, or the concepts lying beneath them–are not deeply problematical aspects of intelligence, but rather that the mystery of thought is how these small, elemental, “trivial” items work together in large, complex (and perforce nontrivial) structures.” Douglas Hofstadter
This assumption is certainly true with data PDFs, and that “mystery of thought” is clearly where CloudTrade has put in all of its R&D efforts. However, should the need for OCR not disappear completely, as might happen if all interactions become electronic and “data” based documents become the norm, then the most promising future for OCR is likely to come out of a hybridisation of perception and understanding.
Variety is the spice of life? Not for data.
Although as Richard mentioned earlier, OCR makes mistakes such as reading “fish” for “fjsh”, what it actually does is identify lists of variations rather than hard and fast answers and then present those variations with their individual certainty values to a user for arbitration (i.e. it could be “fjsh” (60%) or perhaps it’s “fish” (50%)). OCR vendors can then use dictionaries to automatically strip out nonsense words like “fjsh” and perhaps narrow down the possibilities to arrive at the right answer. This doesn’t work, however, when the OCR mistakes still result in words present in the dictionary, or when a word being considered is not necessarily an English word at all (like a part number in a catalogue).
A far more sophisticated solution would be to bring in all these variations in perception straight into the “understanding” engine and then allow the latter to crunch through all of the grammatical options.
This is something that CloudTrade has experimented with since it is possible for them to connect to OCR as the “perception” part of their solution. In doing so CloudTrade has, indeed, found that with a bit of patience and tailoring it can deliver an OCR based service which is just about acceptable and automatic for header-level capture, but it’s too painful and slow to be feasible on complex or not “near-perfect” scanned images.
Dictionary lookups have been a standard feature with OCR vendors for some time. Advances in Machine Learning may well improve matters further in the future. Richard doubts very much that any improvements will happen with things like invoices and purchase orders, where a lot of the key information doesn’t have very much context to draw upon to allow significant automatic corrections to be made, but there could be mileage in using this technology with historical documents written in proper flowing prose.
OCR may well have an interesting future when it comes to scanning documents that were written in the past, but it’s more than likely to now be a past technology when it comes to documents that are to be written in the future.
Richard Develyn, CloudTrade CTO
Whitepaper/Guide: CloudTrade V OCR
CloudTrade is a data capture and automation service for business documents. Using our unique and patented rules-based technology, we extract your valuable data from incoming documents, such as invoices and orders, and process these with 100% accuracy straight into your business applications.
As the market leader in document automation, CloudTrade makes 100% accurate data processing accessible for all senders and receivers of application generated documents. Over 650+ customers globally around the world trust CloudTrade with $20bn worth of documents as their application generated document automation tool, alongside our 50+ partners. Founded in 2010 to offer a fresh approach to electronic document processing, CloudTrade’s unique, patented technology, enables companies to evolve past their reliance on paper or labour intensive manual processing and transact digitally with their trading partners, irrespective of size or technical maturity. CloudTrade’s core product suite focuses on e-invoicing, including newly launched Universal Capture to automate all volumes and document types, Invoice Fraud Protection, as well as Intelligent Ordering Processing. Visit our website for more information – www.cloud-trade.com
About Embridge Consulting
Embridge Consulting was founded in 2009 as an independent consultancy advising businesses who were looking to embark on a change programme where efficiencies and cost savings can be delivered using modern technology. Embridge Consulting partnered with CloudTrade earlier this year to deliver greater value around document and invoice automation. Embridge will shortly be announcing their Cloud Invoice solution powered by CloudTrade – www.embridgeconsulting.com
Sarah Pritchards Marketing Manager email@example.com
Tracey Adams Marketing Manager Tracey.firstname.lastname@example.org