Presentation

Presentation

READ (R Ecognition of writing and Analysis of Documents), started in 1993, is a LORIA (UMR 7503) team

Objective

In the READ team, we tackle several problems related to the segmentation and content analysis and recognition of documents images. The challenge is the ability to understand, exploit the information content as well as to index documents in the appropriate forms that are guided by the applications. On the whole, our research themes are related to (but not limited to):

Document structure modeling

Application to invoice analysis : table detection and extraction
Use of graph representation and matching

Document segmentation: line detection, baseline extraction, word separation, Printed-Handwritten separation

Application to form analysis, table detection and table extraction
Use of rule-based systems, case-base reasoning

Document clustering

Application to document flow separation
Use of incremental and active learning, semi-supervised learning

Document learning

use of deep learning, data augmentation, intelligent annotation
application to historical document analysis