by Abdel Belaïd
Document analysis has become, with the progress in OCR technology and electronic publishing, a helpful tool for document input and manipulation. Since a document is represented by scanned binary images, its comprehension allows the interaction between many research domains such as image analysis, pattern recognition and artificial intelligence. The research effort in the group READ of CRIN, Nancy is to study this theme and open new orientations in this field. We have directed our efforts into three directions: recognition, comprehension and communication.
Recognition leads to the conversion of a document image into basic linguistic and typographic information such as characters, words, font name, etc. We have developed a multifont text recognition system based on font identification and a combination of several classifiers such as Hidden Markov models and k-NN algorithms. Many other projects are now in development concerning the recognition of touched and/or degraded characters and cursive words using trajectory models. The target application of these projects is the recognition of amounts on bank cheques. In these projects, the idea is to develop a decision combination method that can take advantages of the strengths of the individual classifiers and avoid their weakness.
Comprehension means discovering of the useful knowledge embedded in the document structure. Our research begins with structured document for which we have developed a multi-knowledge sources system called GRAPHEIN. Effort was concentrated on the definition of a generic model for a class of documents, specific tasks for analysis and also on an opportunistic strategy which permits a better focusing on principal regions to be analyzed. For this system, we made use of a blackboard architecture and hypotheses generation and management based on entropy control.
The strategy is been tested on micro-structures such as those found in library references, forms, etc. We are also working on a system that permits the automatic construction of generic model by learning. Furthermore, we try to orient our research towards dynamic knowledge and congitive systems based on analogy in reasoning. A new system, called BASCET, is now in test in our group.
Communication is a mean of providing exploitable results for applications and interaction with a user to complete his model specification and to orient the system investigations. We are working on document format such as ODA, SGML, UNIMARC and exchange format procedures. For user communication, some reflexions are conducted to develop a multi-modal system.