PALMIER
 
Principle

It is a system for document digitalization by OCR  integration.  The interest is to be able to combine several OCR to improve the performances in terms of manual correction and error rate. The system includes an Adaptor and an evaluation / combination part.  The adaptation consists in being able to operate on the OCR engine to parameterize it from the platform and to call its various modules with the request.
Functioning

The figure below shows how is made the adaptation by the use of an object class.  The processes are defined by method classes as described  in  Pattern Design. The OCR result  is normalized in XML, allowing to make comparative operations on all the results.
Experiments
Segmentation validation

We used an SVG (Scalable Vector Graphics) DTD  to validate the segmentation result.  XML elements given by OCR are transfprmed into SVG elements. The example below shows in top both XML and SVG elements  and in bottom, on the left-hand side,  the initial document, and on the right-hand side, its SVG transformation.
 

Interface

The figure below shows the digitalization interface. We see on the left-hand side, the segmented document, and on the right-hand side, the SVG transform.