|
|
It is a system for evaluating and combining OCR in order to increase their individual performances. The evaluation is done by comparing OCR result with a reference document (Truth Ground) selected and given by hand for each document class. The comparison is realized by alignment which consists in initially linearizing the document to remove the structure elements, introducing errors, and synchronizing the characters. We use the Myers technique which is an optimization of the dynamic programming algorithm. |
|
|
Recognition rate calculation
The table below shows the recognition precision of several OCR applied
on a document class. The results are given by character class.
|
|
|
Error localization
For each error, we locate its context in the document
in order to define adaptative corrections (heuristics). The example
below shows the context of the letter "L", recognized as an "I" by Finereader
and often
|
|
Combination We selected two OCR for the combination: FineReader
and TextBridge. The combination consists to take as a reference, Finereader,
and to correct it answers by TextBridge when the answer is doubtful.
We represent the OCR performances by the curve (rejection rate, confusion
rate). This curve shows the individual performances of each OCR and
the improvement made by the combination.
|
|
Heuristics
This curve shows the individual performances of
each OCR and the improvement made by the correction heuristics of the numeral
"1". It is seen that while making a little more than 8% of rejection,
one can
|
|