Implementing the Common User Interface for a Digital Library:
The ETRDL experience
Maria Bruna Baldacci, Stefania Biagioni, Carlo Carlesi, Donatella Castelli, Carol Peters
IEI-CNR, Pisa, Italy
The Common User Interfaces for the ERCIM Technical Reference Digital Library (ETRDL) are described and the underlying motivations for certain design decisions are discussed. The lessons that have been learnt from this experience are outlined and possible future developments are suggested.
Towards the end of 1997 the decision was taken by ERCIM (European Consortium for Informatics and Mathematics) to create a digital collection of the technical documentation produced by its scientists and to provide on-line distributed public access to this collection. The intention was to offer a service similar to that provided in the United States by NCSTRL, the Networked Computer Science Technical Reference Library (1). The aim was to assist the ERCIM scientists to make their research results immediately available world-wide and provide them with appropriate on-line facilities to access the technical documentation of others working in the same field.
It was clearly desirable to ensure that these two parallel services were compatible. We thus decided to adopt the same system as that used by NCSTRL: the Dienst system developed by a US consortium led by Cornell University (2, 3), and to include our collections as part of the NCSTRL collection. However, it was quickly apparent that the ERCIM scientific community has its own specific requirements, not all of which are covered by the basic Dienst system as adopted by NCSTRL. Our point of reference was a meeting of ERCIM librarians and information scientists, at the end of 1995, in which the main requirements for the next generation library systems were listed and discussed. With respect to this list, the NCSTRL service was deficient in three important aspects: the need for classification mechanisms; the need to cater for languages other than English; the need to provide on-line document submission facilities. Our task has thus been to implement a system which maintains interoperability with NCSTRL so that users can perform cross-Atlantic bibliographic searches while at the same time extending this system to provide the functionalities requested by the ERCIM users. This means that a user accessing the NCSTRL system can view and query any of the collections of the eight ERCIM institutions currently participating in this initiative using the standard NCSTRL search functions, whereas a user accessing the ETRDL system directly has an additional set of functions available.
The services to be offered by the ETRDL were defined in a meeting between the partners in January 1997 and the specifics for the common user interfaces were agreed. IEI-CNR was given the task of implementing them. In the rest of this paper, we will provide an analysis of the ERCIM user needs, list the extensions that have been made to Dienst in order to meet these needs, and describe how we have developed the Common User Interfaces to the system with respect to these extensions. Finally, we discuss the lessons we have learnt from this experience and discuss possible steps for the future.
2. The ETRDL Users
We recognise three distinct classes of users of our technical reference digital library within the ERCIM community;
information users, i.e. people who will access one or more of the document collections available to find pertinent material;
information providers, i.e. authors, or their representatives, who will submit new documents to a specific collection with associated bibliographic records;
information administrators, i.e. those responsible (usually, but not always, the librarians) for verifying the correctness of the bibliographic records and the associated document files before inserting them into the relevant collection.
The main system interfaces had therefore to cater for the needs of these very different types of users: submission/elimination of information; access/searching for information, management of information. Here below, we discuss in detail the needs of these three user classes.
2.1 Information Users
The basic requirements of the users browsing and querying the collections are to:
find and retrieve pertinent information through interfaces which are simple, intuitive and homogeneous;
retrieve documents meeting specific criteria, such as a given date, language, or type;
have the results of a browse or search presented in an easy-to-understand format;
view and download parts or all of documents retrieved.
Scientific users also want to be able to:
access information on a given domain, using a familiar classification scheme;
have clear indication of the status of the information retrieved (i.e. date, version, source).
Ideally, users would also like to be able to:
access information using their preferred language;
access all information available on given topic, whatever the language.
2.2 Information Providers
These users want to:
submit their document files and the relevant bibliographic records to the system in an easy, fast and efficient way;
be able to classify their documents using familiar classification schemes plus, if necessary, their own keywords;
have mechanisms which automatically check and signal formal errors during compilation of the bibliographic records, e.g. when an mandatory field has not been filled in, when the incorrect syntax is used, etc.;
have direct communication with the system administrator or librarian if necessary;
make their documents as widely available as possible;
be able to update or eliminate information files when necessary.
2.3 Information Administrator
In most of the ERCIM institutions the librarians will be responsible for verifying the formal correctness of the documents and bibliographic records submitted, deciding in which collection they will be inserted, and assigning the identification number. This type of user wants to:
receive homogeneous bibliographic records for all documents, which have been compiled correctly by provider and that conform with the type of bibliographic record normally used by the institution library;
have an easy-to-follow procedure in order to enter new documents into a selected collection;
be able to communicate with the information provider, if necessary.
2.4 General Needs
Common to all types of users is the need to have:
on-line helps available at every step, and for every field, explaining both syntax and semantics;
contact with the system administrator, when necessary.
on-line access to the classification schemes adopted and mechanisms which make it possible to adopt a selected term from the schemes without having to rewrite it.
3. User Interface Design
The user needs outlined above have clearly (i) necessitated a series of modifications to the basic Dienst system, (ii) affected the interface design decisions. It is important to stress that any change to the underlying system will impact, to a greater or lesser extent, the implementation of the interfaces. In this section, we discuss briefly the main issues that have been considered when developing the ETRDL Common User Interfaces. These include the adoption of a common metadata description standard, the introduction of common classification schemes and methods to manage them, the implementation of multilingual interfaces. The first step, however, was to decide on the best way to present the system to its multiple user classes; this influenced the design of the Home Pages.
Figure 1 - The Centralised Home Page
3.1 ETRDL Home Pages
A first major decision regarded the system Home Page(s), i.e. the initial access points. In addition to the different types of users listed above, we have also had to consider two other dimensions: public vs. private; centralised vs. local. The ETRDL collection is intended to be publicly accessible, i.e. also by non-ERCIM users, but such users only need to access the search and browse functionalities; the information provider and administrator services are not relevant to them. At the same time, ETRDL is a distributed collection, consisting of the set of the local collections. The local collections are maintained on the local servers of each partner institution. This has comported the implementation of two levels of Home Pages. A centralised access point has been provided to the system through the DELOS Web site (http://iei.pi.cnr.it/DELOS/), whereas a local home page is installed on each local server. The "views" provided by these two different Home Pages respect the needs of the potential users at each site (centralised and local) and thus provide different points of entry.
Figure 2 - The Local Home Page
The Centralised Home Page is in English only and has been designed for IT information users in general, not necessarily from ERCIM. For this reason, it provides links to pages that describe the objectives of the ETRDL, to on-line documentation, and to other relevant Web sites. It allows the user to access the ETRDL through one of the local servers (see Figure 1). Clicking on the logo of a given institution will open the relevant local home page interface. Our initial intention was to provide direct access to the ETRDL collection (with the extended set of functionalities) from the centralised Home Page. However, it was decided that this was not realistic; it implied maintaining a centralised server as well as the local ones. The user is thus informed that in order to search the ERCIM DL collection, he should select one of the local servers. At the same time, he is given a choice of language as each local server will maintain interfaces in English and in the local language.
The Local Home Page interface caters simultaneously for two user classes: information users and information providers by offering two main options: search/browse any collection; submit/withdraw a document to/from a local collection. This decision was taken to facilitate the local user, e.g. the ERCIM scientist, who can typically play either of these two roles and prefers to use just one main access point to the ETRDL.
A common local home page has been designed; its implementation is localised by each partner institution. The logo of the institution appears in the top left hand corner and, under the title, a button allows the user to switch between the interfaces in English or in the local language (see Figure 2).
From the local home pages, the search and browse functions can be activated over the entire NCSTRL collection, over the ERCIM collection, or over the collection(s) of the local institution. In each case, the user is not only accessing a different collection (or sub-collection), but is provided with a different perspective on the information, depending on the functions that have been implemented at that particular level.
The Administrator Home Page is transparent to the general public and accessible by authorised persons only. The main functions to be provided by the administrator interfaces were decided and defined in agreement with all the partner institutions; they depend to a large extent on the specifics for the Submission and Withdrawal forms. However, no common administrator interfaces have been designed; each local institution implements them according to local requirements.
3.2 The Metadata Set
In the ETRDL collection, each document has a common metadata description associated with it. This description is an extension of the basic metadata set used by Dienst and is compatible with the Dublin Core metadescription standard (4). This is important in order to guarantee future interoperability of the ETRDL with other DL systems.
This description contains the following elements: Title, Author(s) (the person(s) or organisation(s) primarily responsible for the intellectual content), Subject (a list of descriptors selected from the ACM and MSC categories, or free keyowrds), Abstract (an English abstract, and, optionally, a local language abstract), Publisher (the ERCIM institution), Date (the date of the intellectual content ), Type (default value is Technical Report), Format (postscript, pdf, html, text, gif and tif), Identifier and Language.
Each field must be filled in when a new document is submitted to the collection.
The use of the Dublin Core elements in the user interfaces helps to impose uniformity over the collections and also ensures that the occasional user of the system is presented with a standard well-known set of metadata.
3.3 Search Interface
The common search interface to a digital library system is of great importance as the search function is the operation most frequently invoked by the users. The system thus tends to be judged on the merits of this interface: it must provide all necessary capabilities yet be easy-to-understand and easy-to-use. This has been our objective when designing the ETRDL search interface. Although we have tried to implement an interface that does not appear too unfamiliar to an NCSTRL user (homogeneity between systems being a primary user requirement), as will be seen from the following description, ETRDL offers the user the possibility of performing a more finely structured search. This is obviously reflected in our interface.
The choice between three kinds of search, direct search, simple search or fielded search, has been maintained. The first two operate in the same way as in NCSTRL: the direct search can be used to retrieve a document via its Document-Id No.; terms entered in the simple search field are searched throughout the documents in all fields. However, in the fielded search the ETRDL interface also offers a number of additional functionalities: these include searching through a "subject" field and imposing conditions on the language or type of documents to be searched (see Figure 3).
Three different kinds of terms can be entered in the Subject field: ACM Computing Classification descriptors, AMS Mathematics Subject classification descriptors, and/or free keywords.
Figure 3 - The fielded search
There are two reasons why ETRDL offers the possibility of using standard classification schemes to describe and search the documents in its collections: the ERCIM librarians and scientists are accustomed to using such schemes and think of them as an efficient way to store and retrieve documentation; it is a way of imposing homogeneity over the distributed collections - if documents on the same subject are classified using the same descriptors they will be retrieved by the same query, whatever the collection they belong to, whatever the language they are written in. The classification schemes are accessible on-line so that the user can browse them in order to find the most appropriate search terms; he/she can then enter them in the subject field using "cut" and "paste" operations.
The capability of selecting documents by date or type that the NCSTRL interface provides implicitly by allowing the use of substrings when searching the Document-Id field, has been made explicit. Two additional fields, Date and Type, permit ETRDL users to set non-ambiguous conditions on the type, or date of documents to be searched (Dienst capabilities do not permit the selection of documents within a given range of dates). A selector for the choice of document language has also been added. Pop-up menus are installed to facilitate the user; he/she simply has to mark his/her selection by clicking on it with the mouse. On-line helps explaining the syntax and semantics adopted have been installed.
The main criterion adopted when designing and implementing this interface was to facilitate the users task by guiding him/her as far as possible in formulating the query.
3.4 Submission Interface
In order to submit a new document to one of the collections which form part of the ETRDL, the document submission form or bibliographic record must be completed. For convenience, it was decided that the authors of documents should compile their own bibliographic records and submit these together with the text file(s) to the system. The design of this interface was thus extremely important. It was not sufficient to provide on-line helps and access to the classification schemes with "cut" and "paste" mechanisms to enter descriptors on the submission form without the risk of typos. A series of formal verifications are made by the system when the user submits the form, in order to check that all the mandatory fields have been filled in and, where possible, that the syntax has been respected. If the system does not accept the form, it returns it to the user requesting him to correct it. When a correct form is submitted, it is displayed to the user as a bibliographic record and the user is asked to confirm. On confirmation, the form is sent to the administrator of the collection indicated on the submission form; it is the administrator who is responsible for the actual insertion of the new document in the system this is transparent to the user who may well believe that the document has been inserted directly into the system.
3.5 Rendering the Interfaces Multilingual
Multilinguality is an issue of strategic importance for the ERCIM scientific community, which currently consists of 14 member institutions, with 13 different major European languages. The first activities of the ETRDL in this area are aimed at (i) implementing user interfaces capable of handling multiple languages and (ii) providing very basic functionalities for cross-language querying.
Multilingual Access. Each national site is responsible for localisation, i.e. implementation of local site user interfaces in the national language as well as the CUI in English. The user will thus have the choice of using the system in English or in the local language. At the very simplest level, this means translating the common system interfaces (including the on-line helps) into the local language. For the system home pages, at each local site we maintain a version in English and in the local language; the user can switch from one to the other using the language button at the bottom of the local home page. However, all the other interfaces of the system are generated automatically during run-time. The system code thus includes a language variable, which determines whether the procedures should invoke interfaces and system messages in English or in the local language. Of course, localisation also implies providing the metadata field descriptors in the local language as well as in English. One of the tasks of the group is to investigate problems involved in rendering the Dublin Core element set multilingual (5).
More complex at both the interface and the system level is the question of being able to handle and visualise multiple character code sets. Each document submitted to the collection is tagged for language. Mechanisms are currently provided for the local display and printing of non-Latin-1 languages (this has been implemented at ICS-FORTH). In the future, we will probably move to Unicode. We are currently working on implementing mechanisms for the indexing of documents in languages other than English; this, however, is a question that remains transparent to the user.
Cross-language Querying. A simple form of cross-language querying is possible using the controlled vocabulary (ACM/AMS) terms. All documents in the ETRDL, in whatever language, classified using this scheme, can be searched. As authors are also requested to include an abstract in English, English free term searching over documents in any language is also possible. INESC has developed an LDAP service with a multilingual repository for the ACM and AMS classification systems (currently implemented in English and Portuguese), which will be integrated in the ETRDL system. This multilingual service will make cross-language querying in local languages possible in the future.
4. Lessons Learned
It is probably true to say that most of the lessons we have learnt from this experience are predictable from the literature and, at a first glance, may appear all too obvious. However, it is one thing to recognise the existence of a problem theoretically; it is quite another story to have to implement real-world solutions to this problem. A number of factors may well affect the decisions taken, e.g. time and cost issues, the need to counterbalance between different needs and different priorities. The solution adopted is often a compromise. An example of this is our decision concerning the Centralised Home Page; we would have preferred to provide direct access both to the NCSTRL server in the US and to the ETRDL service from this page. However, this raised two problems. In the first case, the NCSTRL service which is accessible via the US server is no longer the same as the one we have implemented on our local servers. It implements a different version of Dienst (see the following section for details). It was decided that it would only cause confusion to the ETRDL users if we offered them access to two different versions of the same system. In the second case, direct access to the ETRDL from the Centralised Home Page implied maintaining an extra server; this would have been costly in terms of implementation and we have thus been forced to provide ETRDL access only from the local servers.
In other cases, instead, a decision has been made in favour of the user requirements. For example, according to our user requirements, the ERCIM interface should offer i) subject access and ii) date/type/language selectors. However DIENST was found to be inadequate for these functions because:
i) DIENST search strategies permit only one boolean operator (either OR or AND) to be used between entered fields, but the content of the Subject field is matched against three different indexes (ACM, AMS, and free keywords) with an OR logic. This conflicts with DIENST search strategies if the user also wants to use the AND between the bibliographic fields;
ii) DIENST is a session-less system, i.e. it does not permit the search results to be further processed by the user. Consequently, the date/type/language fields cannot be used as true selectors (i.e., to select the search results); they must be used as search fields and always ANDed with any bibliographic fields entered. This conflicts with DIENST search strategies if the user wants to use OR between the bibliographic fields.
In this case, these conflicts have been resolved by heavy changes to the retrieval mechanisms of Dienst, of the help instructions, etc.
Therefore, the main two lessons we have learnt - or perhaps more correctly that we have had confirmed - while implementing the ETRDL interfaces are that:
it is difficult to make clear distinctions between the interfaces and the underlying system; changes to one almost always affect the other at times in an unexpected fashion;
it is not easy to modify and extend an existing system; at times it is impossible when the extensions actually affect the philosophy of the system.
Dienst as implemented by NCSTRL - provides a simple, monolingual free-text search service. We have extended this service by adding controlled vocabulary search facilities, multilingual interfaces, forms for the on-line compilation of bibliographic records and the submission and withdrawal of documents. All these extensions have led to the creation of a complex system, designed to meet the needs a number of different user types and profiles. This in its turn has lead to the need for a careful study of the interfaces in order to present the correct view of the system to each class of user.
It is true to say that during the implementation of the user interfaces, we have been forced to realise that many of our original assumptions were over-simplistic and did not reflect the true complexity of the system we were developing. This has necessitated a cyclic process: initial definition of the functionalities to be offered by ETRDL; consequent modification to the Dienst system; implementation of interfaces; distribution of the system to the other partners for testing; revision of the system and adaptation of the interfaces in response to the feedback received; redistribution and retesting.
To sum up, we may be a little sadder (and less optimistic) than when we started but we are certainly much wiser with respect to the underlying implications of the complex task of implementing user interfaces for a distributed, multilingual DL system.
5. Next Steps
Important decisions have now to be taken by the ERCIM DL group. In this paper we have described the first installation of ETRDL service, the design of the user interfaces, and the lessons we have learnt from this experience. Our aim was to implement a simple but effective service not only for the ERCIM but also for the general IT community in Europe, which would provide them with fast access to scientific documentation. At the same time, we provide the ERCIM scientists with an easy method to make their results immediately available to their peers, without having to wait maybe years for official publication. We are now evaluating the impact of the service on our users.
However, as so often happens in the computer science world, developments proceed at a break-neck pace. While we are in the process of completing the first stage of the common implementation, a new (and final) version of Dienst has been developed at Cornell. This version provides functionalities to order the results (including ranking). NCSTRL has adopted this new version of Dienst. If we want to maintain compatibility with the NCSTRL service, we must produce a new version of the ETRDL system which incorporates the new functionalities. The problem is that, as the ETRDL system now represents a heavily modified version of Dienst, it is difficult to estimate exactly how much work is involved in an upgrading to ensure compatibility with the new NCSTRL service. At the same time we recognise that the service we are offering is limited to text and images; a DL should also be capable to providing capabilities for the storage, management and access and retrieval of multimedia objects, e.g. also audio and video. The Cornell group has announced that it is not considering further developments to Dienst. It is now working on the design of a new object-oriented DL architecture (8,9), which will be able to handle such objects. Perhaps our next step should be to consider a system of this type. This would of course also imply a complete revision and redesign of the user interfaces.
1. Networked Computer Science Technical Report Library. http://ww.ncstrl.org
2. C. Lagoze, E. Shaw, J. R. Davis and D.B. Krafft, Dienst: Implementation Reference Manual, Cornell Computer Science Technical Report TR95-1514.
3. C. Lagoze, J. R. Davis Dienst: an Architecture for Distributed Document
Libraries, Communications of the ACM, 38 (4) April 1995, page 45.
4. Dublin Core Metadata Element Set: Resource Page. http://purl.org/metadata/dublincore.
5. Multilingual Dublin Core: http://www.cs.ait.ac.th/~tbaker/dc-multilingual.html
6. Biagioni, S., Borbinha, J., Ferber, R., Hansen, P., Kapidakis, S., Kovacs, L., Roos, F., Vercoustre, A.M. (1998). "The ERCIM Technical Reference Digital Library". in ECDL'98 Proceedings, Crete, Greece, September 1998, pp.905-906 - (http://www.iei.pi.cnr.it/DELOS/EDL/ETRDL98.html).
7. ETRDL Demo Decription: Handout distributed at ECDL'98, Crete, Greece, September 1998 (http://www.iei.pi.cnr.it/DELOS/EDL/JPEG/etrdl0998.html).
8. Payette Sandra and Lagoze Carl. (1998). Flexible and Extensible Digital Object and Repository Architecture (FEDORA). In: Research and Advanced Technology for Digital Libraries : Second European Conference, Proceedings ECDL'98, Christos Nikolaou and Constantine Stephanidis (Eds.).- Berlin : Springer, 1998. (Lecture Notes in Computer Science, Vol. 1513). ISBN 3-540-5101-2
9. FEDORA CORBA IDL. - (http://www2.cs.cornell.rdu/payette/papers/ECDL98/FEDORA-IDL.html)
The implementation of the ETRDL is the result of a collaborative activity; the development of the Common User Interfaces was the task of IEI-CNR. The authors would like to gratefully acknowledge the assistance of the other ERCIM participants in this activity, both in the initial formulation of the specifications, and in the feedback received as a result of testing the first prototype. They would also like to thank the developers of the Dienst system and, in particular, Carl Lagoze and David Fielding for their generous assistance and advice.