|
|
It is about a system for mail order segmentation into informative areas. The required areas correspond in the figure to the surrounded zones in red colour: two address zones , an article zone and a total amount zone. As the page-setting is often changing, the segmentation method must
function in an adaptive way without using a rigid model of the page-setting.
|
|
|
The method is based on the localization of anchor points (some fixed words) in each area. Then, the segmentation is based on these points to delimit the area according to their site in this area. The anchor point search is done by discrete relaxation using a
constraint graph of constraints definded for each area. The constraints
express the properties of the words and the topographic relations between
them.
|
|
|
Word Localization
After extracting the connected components, we
put together the close components based on criteria related to the space
length between characters and words.
|
|
Example
The figure below gives in blue the anchor points
of the address zone. No recognition of the characters is made, but
rather a regrouping of related components. The selection of the corresponding
words within the image is carried out by discrete relaxation on word image
candidates.
|
|
The figure below shows the extension of the context, first around the anchor points (red) and opposite to the anchor points (green). The line limits are given by a vertical extension of the context (blue). |
|
Resultts
Tables below shows the success percentages
for all the zones, for two separated databases.
|
|