GRAPHEIN
 
Principle

It is a retroconversion system of macro-structural documents.  The system architecture is multi-agents .  It
is based on a generic model of structures and provides a specific structure for the treated sample.

 
Model

It describes the topographic relations between the document elements, in the forms of sequences, aggregates and mosaics for the macrostructure, and in the forms of zones and terms for the microstructure.  One accompanies these structures of the direction of progression:  LC for reading direction, HB for top towards bottom, GD for left towards right-hand side.  The physical aspect is more significant in the macro one than in the micro.  On the level of micro, one is very quickly close to the words.

 

The logical aspect is also represented in two manners:  1) by association of a logical label when there is a correspondence with the physical object, 2) by qualifiers of the optc type which indicates, by a function, the type of association:  overlapping, reference, illustration, etc.  The generic aspect is carried out by qualifiers of the Reference mark type:  repetitive, Opt:  optional, Cho:  Sep choice:  indicate the physical type of separator between the areas. It comes to supplement the representation of the objects by their borders.

 

The model is described formally by an attributed grammar.  It contains generic constructors and qualifiers, attributes and weights describing the certainty on the description of the objects.
 

 
Retroconversion

The strategy is based on the management of hypotheses provided by the model.  According to the number and the safety of these assumptions, the strategy is downward (assumption single and sure) or downward (contrary case).

 

The system is of multi-agents type.  It is based on the engine ATOME Blackboard contains the current solution (state of the specific structure). The selector receives a summary of the BB;  it actuates the corresponding tasks (méta-sources of knowledge).  These tasks select  in their turn the specialists (sources of specialized knowledge) which act directly on the BB. Among these specialists, one finds treatment functions, but also the
hypothesis management function of the model.
 

 
Results

The following figure shows the segmentation result of the first page of an article of the type TSI.  On left, one finds the state of the calls and on the right, the result.  Each object is surrounded and accompanied by a label.  It is seen that the system is able to identify titles, legends, paragraphs and continuities of pagraphes on the columns.
 

The following figure shows that the system is able to identify the figures and their legends, as well as the numbers of page.