AC4
 
Principle

It is a system for form sementation and cell classification. There are 8 different classes according to the content nature: numerical or alphabetical, to the aspect: vertical or horizontal, to the style: upper case or lower case...
Functioning
Cell Extraction

We use the Hough transform to perform the line directions, then the lines and columns are found by following them within the image. For Hough transform, only the contour points vote. As we can see in the figure below, the system is able to detect straight lines as well as curved lines.
 


The figure below shows the projection space for horizontal lignes which has the shape of butterfly wings.
Item Classification

Classification is carried out in three stages.  The first stage allows to separate the verticals from the horizontal ones.  The second identifies the subclasses in each category, and the third decides for the numerical ones, by arranging the problem of the zero which is often cut.

Item Classification

We use a multi-layer network to classify the vertical labels and the vertical labels with capital letters.  The same type of networks is used for the horizontal labels and the numerals.
 

Experiments
Resultts

The colors correspond to the identified classes.  It is noticed thus for example that all the numerical ones are in red.
 


The table below gives the confusion matrix of the classes.