|
|
It is a system for document digitalization by OCR integration. The interest is to be able to combine several OCR to improve the performances in terms of manual correction and error rate. The system includes an Adaptor and an evaluation / combination part. The adaptation consists in being able to operate on the OCR engine to parameterize it from the platform and to call its various modules with the request. |
|
|
The figure below shows how is made the adaptation by the use of an object class. The processes are defined by method classes as described in Pattern Design. The OCR result is normalized in XML, allowing to make comparative operations on all the results. |
|
|
Segmentation validation
We used an SVG (Scalable Vector Graphics) DTD
to validate the segmentation result. XML elements given by OCR are
transfprmed into SVG elements. The example below shows in top both XML
and SVG elements and in bottom, on the left-hand side, the
initial document, and on the right-hand side, its SVG transformation.
|
|
Interface
The figure below shows the digitalization interface. We see on the left-hand side, the segmented document, and on the right-hand side, the SVG transform.
|
|