PALMIER-OCR
     
Principle

It is a system for evaluating and combining OCR in order to increase their individual performances.  The evaluation is done by comparing OCR result with a reference document (Truth Ground) selected and given by hand for each document class.  The comparison is realized by alignment which consists in initially linearizing the document to remove the structure elements, introducing errors, and synchronizing the characters.  We use the Myers technique which is an optimization of the dynamic programming algorithm.
 

  
OCR Evaluation
Recognition rate calculation

The table below shows the recognition precision of several OCR applied on a document class.  The results are given by character class.
 

 
OCR Combination
Error localization

For each error, we locate its context in the document in order to define adaptative corrections (heuristics).  The example below shows the context of the letter "L", recognized as an "I" by Finereader and often
by an "L" by TextBridge.
 


Combination

We selected two OCR for the combination: FineReader and TextBridge. The combination consists to take as a reference, Finereader, and to correct it answers by  TextBridge when the answer is doubtful. We represent the OCR performances by the curve (rejection rate, confusion rate).  This curve shows the individual performances of each OCR and the improvement made by the combination.
 

Heuristics

This curve shows the individual performances of each OCR and the improvement made by the correction heuristics of the numeral "1".  It is seen that while making a little more than 8% of rejection, one can
reach the rate of 1.3 per 10 000.