Joint ERCIM Actions
ERCIM News No.33 - April 1998

The Aquarelle Terminology Service

by Martin Doerr and Irini Fundulaki

Thesauri and other kinds of authority data, place names, artist names, periods etc, are important for the intellectual access to information assets. It is widely accepted that the use of thesauri for the classification of assets and as search aid considerably improves precision and recall of retrieval methods. This holds in particular for database records with few words per field ­ the typical form of records about museum objects. Even more in a multilingual environment there should be support for translation ­ either transparent transformation of the valid terminology a user refers in his request into the languages of the addressed databases, or at least a guidance of the user to the local terminology in use. The Aquarelle users required from an early stage of the project that this support must be based on high quality information managed by human experts.

For optimal results, the terms used for asset classification, in the search aid thesaurus and in the experts' terminology should be consistent. This led us to a three level architecture of components cooperating with the IT environment of an Aquarelle installation: vocabularies in local databases, local thesaurus management systems of wider use and central Term Servers for retrieval.

Typically, local databases have a more or less idiosynchratic way to enforce vocabulary control. For reasons of standardization of format and centralization of handling, we foresee an independent thesaurus manager to which the vocabularies of several local databases are loaded, and in the sequence organized as thesauri (authorities) by an expert, following variations of the ISO2788 semantic structure. In addition, standard external vocabularies can be loaded. These authorities may be specific to one database, a user organization, or a whole language group. The local vocabularies and terms already used for classification may need updating with changes done at the thesaurus manager.

The Aquarelle Access Server, which is responsible for the distribution and transformation of user requests, needs knowledge of the authorities in local use, at least of the higher level terms. Therefore it communicates with one or more Term Servers, which hold released versions or extracts of the local authorities. Moreover, a Term Server must be fed with equivalence expressions between the meaning of terms in different authorities, either by an expert team or by linguistic methods and subsequent human control. These expressions are used to replace the terms in a user request with equivalent terms of the target system ­ automatically or in a dialogue with the user.

This three stage architecture reflects ideally the practice and needs of classification, expert agreement, user organization and search aids. It is a fully scalable solution and a flexible approach to standards enforcement. The Semantic Index System-Thesaurus Management System (SIS-TMS) developed in the past by FORTH was extended for Aquarelle in order to support the above scenario. It will be a product of FORTH by summer 1998. It implements a client-server architecture. There is a client for reading and one for manual editing. Another client on the same base, the Term Server, was developed by the ILSP, the Institute for Language and Speech Processing, Athens .

The system has several innovative features. It allows to maintain multiple, multilingual thesauri and their interrelations in one logical database. Different teams can cooperatively maintain multiple systems of semantic relations on a shared body of terms and concepts. User groups can further specialize the semantics of ISO2788 and ISO5964 (multilingual thesauri) links and add custom fields.

The SIS-TMS graphical user interface allows for the unconstraint navigation within and between different thesauri; the execution of predefined queries and graphical views to identify concepts for cataloguing or database queries; to identify translations or equivalent expressions for information access in a heterogeneous environment; and to control the quality and the logical consistency of a system of interlinked thesauri.

The editing system maintains a history of changes and provides release operations for a set of changes done. Referential integrity and vocabulary control is maintained throughout the system. These mechanisms are the prerequisite to update incrementally the local databases and the Term Servers from the thesaurus development units at regular release intervals with minimal possible human intervention.

The local integration of the user databases with the terminology system exceeds the Aquarelle framework, but it can be done now with minor effort by any skilled programmer. Aquarelle will end with the evaluation of the terminology management system and the Term Server, using as test data the Art & Architecture Thesaurus (product of the Getty Information Institute, Los Angeles; the largest of its kind with some 60.000 terms, for more information, see, RCHME and the multilingual MERIMEE thesaurus. With this terminology service, Aquarelle provides the technical means to solve one of the major problems of the semantic interoperability. Further work must concentrate on means to make the creation of authorities cheaper, and on the social organization of the cooperative development and use of authorities.

Please contact:
Martin Doerr
Tel: +30 81 39 16 25

Irini Fundulaki
Tel: +30 81 39 16 37

return to the contents page