ERCIM News No.46, July 2001 [contents]
by Andreas Becks and Matthias Jarke
Structuring and condensing corporate document collections is an important aspect of knowledge management. At the GMD Institute for Applied Information Technology (FIT), scientists have developed the corpus analysis tool DocMINER, which provides interactive visual access to text collections. Successful industrial applications include the critiquing of technical document collections such as user case descriptions in software engineering, or user manuals of complex engineering systems.
Since a great deal of corporate knowledge is contained in textual documents, techniques which provide analysts with task-adequate access to text collections play an essential role. Of particular importance is an effective support for explorative corpus analysis tasks, where the user is concerned with discovering patterns in the document space and getting an overview of available documents and their semantic relationships. This is an application area where information visualisation promises to be helpful: document maps visualise the overall similarity structure of a corpus of texts, using a suitable metaphor reminiscent of geographical or astronomical cartography.
In the DocMINER project (conducted at RWTH Aachen and GMD-FIT from 1998 to 2001), we have designed and evaluated a document map system for visually aiding text corpus analysis tasks in knowledge management. DocMINER differs from earlier document mapping efforts in that it is based on a careful analysis of why and how users actually employ technical document collections in their work. The resulting domain-specific task model not only serves as a guide for the technological development but also as a yardstick for evaluation.
DocMINER supports an adaptable framework for generating a graphical corpus overview, using a modular combination of algorithms from classical information retrieval, spatial scaling, and self-organising neural networks. The basic method allows a fine-grained (dis-) similarity analysis of specialised document collections. It can be tailored to domain-specific needs, since the module for assessing the similarity of documents is exchangeable. A semantic refinement extension based on fuzzy rules enables the analyst to incorporate a personal bias into the map generation process. The system DocMINER an interactive, map-centred corpus analysis and text-mining tool tightly integrates the graphical display with explorative and goal-directed interaction methods. Its interface design was guided by Schneidermans Visual Information Seeking Mantra: overview first, zoom and filter, then details-on-demand. System features include different zoom, scaling and sub-map functions, the means to define and assign document symbols, an annotation function, automatic map labelling and document group summaries, and a tight coupling with a query-driven retrieval interface.
So far, three application areas have been looked at via case studies in science and industry. Firstly, the UML-based large-scale software standardisation effort of a consortium of worldwide chemical industries employed DocMINER to analyse the consistency of collaboratively written user cases stored in GMD-FITs BSCW internet workspace environment, and to assess the relevance of each scenario to the design of the different architectural components. Secondly, DocMINER was used by a software house supporting the steel industry in quality assurance for technical product documentation, i.e. checking the consistency of the topical structure of user manuals and defining single information sources. Thirdly, in a more scientific environment, DocMINER served as a forum for the discussion of relationships between sub-projects and terminology uses in Germanys Cultural Science Research Centre, Media and Cultural Communication, in Cologne. Each of these applications provides strong anecdotal experience that document maps improve the way in which the respective types of work are traditionally done. This was also confirmed by feedback from leading industrial technical documentation fairs.
To investigate how relevant to these successes is the document map visualisation itself, a controlled laboratory study with students and technical documentation experts was conducted, comparing the map interface with the usual title or abstract lists provided by search engines and similar IR tools, while leaving all the other features of DocMINER in place. The results clearly confirmed the task-adequacy of document maps: the computation of the overall similarity structure of the text corpus and its visualisation helps to significantly improve the effectiveness of typical task solutions. Furthermore, test subjects subjectively preferred the document map system in nearly all cases.
|A snapshot of the DocMINER user interface.|
Summing up, from the viewpoint of target users the document map approach offers meaningful insights into a collections structure, and allows one to effectively study relationships between single documents and document groups. It is particularly successful for supporting tasks that require a detailed structural analysis of document-document, document-topic or document-specification relationships.
With its text analysis features, DocMINER complements the range of visually oriented data exploration and information brokering tools developed in GMD-FITs research department on Information Contextualisation (ICON), including the InfoZoom visual querying and navigation environment for relational databases, and the Brokers Lounge environment for the creation of context-specific brokering systems, both of which now enjoy a fair number of commercial applications. Further work will link these solutions with each other and with techniques for metadata management in the data warehousing and information flow management that FIT is developing in co-operation with a number of regional SMEs, eg for applications in financial controlling and e-learning.
Andreas Becks, Matthias Jarke GMD
Tel: +49 241 80 215 14