Presentation

PALMIER

Principle

It is a system for document digitalization by OCR integration. The interest is to be able to combine several OCR to improve the performances in terms of manual correction and error rate. The system includes an Adaptor and an evaluation / combination part. The adaptation consists in being able to operate on the OCR engine to parameterize it from the platform and to call its various modules with the request.

Functioning

The figure below shows how is made the adaptation by the use of an object class. The processes are defined by method classes as described in Pattern Design. The OCR result is normalized in XML, allowing to make comparative operations on all the results.

Experiments

Segmentation validation

We used an SVG (Scalable Vector Graphics) DTD to validate the segmentation result. XML elements given by OCR are transfprmed into SVG elements. The example below shows in top both XML and SVG elements and in bottom, on the left-hand side, the initial document, and on the right-hand side, its SVG transformation.

Interface

The figure below shows the digitalization interface. We see on the left-hand side, the segmented document, and on the right-hand side, the SVG transform.