The Aquarelle - CIMI Z39.50 Profile and its Mapping to Cultural Databases
by Oreste Signore
Accessing cultural heritage data raises some important issues which have been discussed in-depth over the last decades. In particular, attention has been given to questions regarding information structuring and normalisation of the language. Cultural differences constitute a major obstacle to the definition of a common data schema. It is therefore not surprising that several international organisations are currently studying this problem. In Aquarelle, however, additional complexity is caused by the need for simultaneous access to several distributed archives, based on different schemas and managed by different software in a heterogeneous hardware environment. We will describe how we have attempted to solve this problem.
At the beginning, the project identified the Z39.50 standard as the means for integrating different archives and guaranteeing interoperability, and the CIMI (Consortium for Computer Interchange of Museum Information) profile as a necessary reference point. This profile, however, is designed for moveable objects, stored in museums. It is thus not suitable, as it stands, to describe unmoveable and complex objects, where certain attributes, such as geographic location, administrative boundaries and so on have great relevance. The Aquarelle profile has thus been defined to be as compatible as possible to the continuously evolving CIMI profile, while comprehensively describing the complex variety of archives included in the Aquarelle network. The definition of this profile is the result of a collaboration between CNUCE-CNR, Finsiel, and SSL, for the technical aspects, and cultural partners (especially MDA) for the semantic ones.
The profile is now undergoing a final revision process, which will take into account feedback from users of the Aquarelle Version 1 test phase, before being proposed as a standard profile to the Z39.50 Agency. An additional constraint has been the decision to leave the archives intact: data suppliers should not be forced to modify their databases. This has resulted in the implementation of a number of Z39.50 gateways. In all cases, it is necessary to map both services and data.
At the moment, four different Z39.50 targets (or Z39.50 gateways) support the Aquarelle profile, developed by Bull (for Mistral), CNUCE-CNR (for relational databases, currently implemented for Sybase and Oracle), Finsiel (for Basis+) and SSL (for Index+). The Bull, CNUCE and SSL gateways are all based on the YAZ toolkit, whereas the Finsiel one has been written from scratch. The initial tests have shown that in spite of some differences (not all the gateways support the same protocol features) we have a good interoperability standard.
The main difficulty remains a correct, semantically meaningful and complete mapping between the Aquarelle profile and the data archives. There is an inherent complexity arising from differences in data structuring depending on whether the target system is an Information Retrieval System or a DBMS.
In the first case, the basic flat file model is the same; we have just to map the Z39.50 services onto the IRS services. Even if this is not a trivial task, the IRS normally provides the primitives needed to implement the services, such as indexing by words, structuring in paragraphs, and so on. It is when Z39.50 services are mapped onto DBMSs that difficulties arise. Firstly, the data are often stored as a set of (normalised) tables thus making it difficult, or at least expensive and time consuming, to reconstruct the document (or database record in the Z39.50 terminology). Secondly, some of the features required by the Z39.50 standard and by the Aquarelle profile have no precise equivalents, forcing the designer to ignore or only partially support them, eg, traditional relational DBMSs cannot search for words or truncated words, but only for substrings. This means that a fundamental functionality either has to be ignored, or supported in an approximate way, wasting resources and producing noisy answers.
It should be noted that some popular RDBMSs, such as Oracle and Sybase, now support new features, like indexing and free text retrieval, using conventional IRS operators. This permits easier mapping, but also a platform migration for existing archives. We cannot forget that mapping the Z39.50 profile to existing archives remains a complex task; both syntactic and semantic difficulties must be addressed. Typically, some USE Attributes or Access Points of the Aquarelle Profile will map to one or more database fields (and sometimes to none), and vice versa. Finding the most suitable trade-off between "silence" and "noise" in response can be very cumbersome. It must be remembered that we do not only have to map fields, but also operators on them (like word truncation, phrase, word list): the decision taken can affect retrieval effectiveness.
In conclusion, the implementation of Z39.50 gateways supporting the Aquarelle Profile has given data providers in Aquarelle a high degree of functional and semantic interoperability both internally and with other Z39.50 clients, and allows almost any data provider to become an Aquarelle member.
Oreste Signore - CNR-CNUCE
Tel: +39 50 593201