|
|
It is about a system for bibliographic record retroconversion. This system was carried out within the framework of European project MORE with as partner company JOUVE and the Royal Library of Belgium, Albert I. The catalogues are digitized, then passed with the OCR. Work consists in identifying the various fields of the records and representing them in UNIMARC. |
|
|
The following figure shows some problems from which some come from the OCR, others from the style of the writing, others finally from the structure. Certain separators can miss, some fields can overlap two lines, the secondary author can be confused with the principal author... |
|
As they are Belgian notes, they are in general duplicated in the two official languages. The system must take into account this change of language which influences the change of the structure: names different from certain fields, different abbréviations... |
|
The model is described formally by an attributed grammar. It contains generic constructors and qualifiers, attributes, weights describing the certainty on the objects and the actions to carry out local strategies on particular fields. |
|
The following figure shows some examples of attributes with suitable syntax - italic indicates that the field is not in italic, A and G are weights between A and Z. |
|
|
The strategy is based on the management of assumptions provided by the model. According to the number and the safety of these assumptions, the strategy is downward (assumption single and sure) or downward (contrary case). The local strategies are in general preconditions with the execution of the tasks. The post-conditions are local strategies making it possible to avoid the launching of all the systemon a particular case. |
|
|
The following table shows the results obtained over one year, month by month. It gives the percentages each month by separating the doubtful solutions from the correct solutions. |