VPC
   
Principle

It is about a system for mail order segmentation into informative areas.  The required areas correspond in the figure to the surrounded zones in red colour:  two address zones , an article zone and a total amount zone.

As the page-setting is often changing, the segmentation method must function in an adaptive way without using a rigid model of the page-setting.
 

  
 Functioning

The method is based on the localization of anchor points (some fixed words) in each area.  Then, the segmentation is based on these points to delimit the area according to their site in this area.

The anchor point search  is done by discrete relaxation using a constraint graph of constraints definded for each area.  The constraints express the properties of the words and the topographic relations between them.
 

Experiments
Word Localization

After extracting the connected components, we put together the close components based on criteria related to the space length between characters and words.
 

Example

The figure below gives in blue the anchor points of the address zone.  No recognition of the characters is made, but rather a regrouping of related components. The selection of the corresponding words within the image is carried out by discrete relaxation on word image candidates.
 


The figure below shows the extension of the context, first around the anchor points (red) and opposite to the anchor points (green). The line limits are given by a vertical  extension of the context (blue).
Resultts

Tables below shows the success percentages  for all the zones, for two separated databases.