DSS
 
This software is designed by KC Santosh and improved by Tapan Bhowmik between January 2014 and August 2014. The goal of the project is to extract content within tables in document images based on learnt patterns. Clients provide a set of key fields in the form of client pattern within the tables which they think are relevant. The extraction will be based on the search for similar patterns.
The pattern

The client pattern is represented with a graph by:

  • Assigning fields to nodes
  • Labeling attributes via several features
    • regular expression, feature vector of content, size, number of words, number of lines, word separation gap etc. each attribute possesses, for instance
  • Computing possible relations that exist between attributes
    • spatial relations

Here is a generated graph

Graph Mining
Graph mining in presence of client (Major Steps)
  • Step-1: Starting with any arbitrary node in the graph, find a similar field in the document with vertical window search
  • Step-2: Validate the similar field via feature score. If feature score is greater than a certain threshold then it is considered as similar
  • Step-3: Find associated fields via relations
  • Step-4: Compose Graph
  • Step-5: Calculate Graph matching score
  • Step-6: Validate similar pattern if graph matching score is greater than a certain threshold