Presentation

SEGMENTATION

Principle

This system has been developed by Tunde Akindele. This is a page segmentation method employed in a document analysis system. This method allows one to cut a document page image into polygonal blocks as well as into classical rectangular blocks. The inter-column and inter-paragraph gaps are extracted as horizontal and vertical lines. The points of intersection between horizontal and vertical lines are treated as vertices of polygonal blocks. With the aid of the 4-connected chain code and an intersection table, simple isothetic polygonal blocks are constructed from these points of intersection. The straight line joining two points of intersection corresponding to two neighboring 1 entries in the intersection table is treated as a line segment that forms a side of a polygonal block.

This method is tolerant to skewed documents and also robust enough to be applied to obtain polygonal blocks of any shape and any number of sides.

Functioning

The horizontal and vertical white gaps in the page image are extracted as white rectangles by extracting and merging white segments in the image. To avoid the use of the white spaces in character images, or those between characters, words and/or lines, white segments whose lengths or widths are less than a certain threshold are discarded.

Graph construction

Having extracted horizontal and vertical lines respectively from the horizontal and vertical white gaps in the page image, we proceed to construct polygonal blocks from these lines. To facilitate this construction, we make use of an intersection table inside which we walk around with the aid of 4-connected chain codes and a direction table.

Polygonal block extraction

Each vertex of a simple isothetic polygon is directly connected to the vertex on its right or left and also to the vertex above or below it. Therefore, to move from one vertex to another, one needs to move to the right, or to the left, or up, or down. This has lead to the use of the 4-connected chain code. A chain code is an integer from 0 to 3 that indicates the direction in which to move from the current vertex to get to the next one as shown in figure~\ref{ccode}. Thus a chain code of 0 indicates a movement to the right, 1, a movement up, 2, a movement to the left and 3, a movement down.

Polygonal block extraction

The search for blocks is done by walking through $IT$ row-by-row. The idea is to start the search from each ``1'' entry in $IT$ and then walk through the table with the aid of the entries in $DT$ until the starting point is reached again, noting the direction of movement at each change of direction. The search for a ``1'' entry in anydirection starts at a given point and continues in the direction until a ``1'' entry is reached or until the edge of the table.

Experiments

The method has been implemented using C codes and on a SUN SPARC station IPX under SunView environment. It has been tested using a series of document pages obtained from scientific journals and other magazines. The pages are scanned with Hewlett Packard Scan Jet IIc scanner, with a resolution of 300dpi, producing binary images of
dimensions 2550 x 3500 pixels.

It gives satisfactory results on all tested images and takes about 6 seconds par page. Figure below shows an example of document pages where the structure is mosaical.