A Powerful and Scalable Digital Library Information Service

by Henri Avancini, Leonardo Candela, Andrea Manzi and Manuele Simi


The implementation of a Digital Library capable of putting Europe's memory on the Web demands a service-oriented, federated and distributed approach. Supporting such an approach requires the introduction of a new type of enabling service, usually called an Information Service, which can collect and disseminate information on the resources that constitute the federation. In large and distributed Digital Libraries, the key features of this service are scalability and availability.

The DLib group of the Networked Multimedia Information System Laboratory at ISTI-CNR has extensive experience in building digital libraries (DLs). This experience arises from the participation with scientific leadership in a series of EU IST projects such as SCHOLNET. It also stems from the development of OpenDLib, a highly flexible Digital Library Service System that has been shown to be suitable for building and operating a range of digital libraries. One of these is the DELOS DL, which manages the documentation of the DELOS Network of Excellence. Another is the BELIEF DL, which serves the eInfrastructure community by collecting and providing focussed views over multimedia documents, as well as presenting the latest details relating to projects, initiatives and events.

Our experience leads us to believe that the service-oriented approach with loosely coupled services is the most appropriate architectural approach for building highly distributed systems. This approach relies on independent services that provide the expected functionality by cooperating with other services of the federation. In order to produce distributed DL infrastructures of a high quality, we find that an effective discovery phase of the constituent components and careful monitoring of the infrastructure are mandatory. Supporting these features means relying on two kinds of information about services: (i) static information, which includes data that remains fixed during the service lifetime, eg location, usage policies and configuration parameters; and (ii) dynamic information, which contains data on the operational state of the service, eg the set of properties that keeps track of events during a given sequence of interactions with a user, a device or another service.

Information Service Logical Architecture.
Information Service Logical Architecture.

The DILIGENT Information Service
These features were recently included in the DILIGENT infrastructure, developed as part of the project of the same name. DILIGENT is a testbed infrastructure that will allow members of dynamic virtual eScience communities to create on-demand transient DLs based on shared computing, storage, applications, and multimedia and multi-type content. It is designed as a service-oriented architecture over Grid technology and relies on WS-* family standards, namely the WSRF framework and the WS-Addressing, WS-Security and WS-Notification specifications.
In this infrastructure, discovery and monitoring occur through a specific service, called an Information Service (IS), depicted in Figure 1. This service is organized in three logical parts, each serving the needs of a class of actors: information producers, collectors, and consumers.

Producers and consumers are supported in interacting with the IS via a lightweight component that is distributed on each hosting node of the infrastructure. This component is called an IS-Client, and supports three main features: (i) publication of the information (IS-IP library); (ii) access to information and discovery via querying and subscription/notification mechanisms (IS-C); and (iii) the local storage and maintenance of useful and constantly updated information (IS-Cache). The IS-Client allows information in the distributed infrastructure to be efficiently accessed and published, while hiding any detail of the routing process that could identify the collectors involved.
The collectors aggregate the producers' information. This part is composed of two components, the IS-Registry and the IS-IC. The former acts as a classical registry and maintains the list of available services and their static information. The latter maintains the dynamic information and is based on a highly distributed architecture.

From an operational point of view, it is important to note that each time one of the federation's services is deployed, it is first registered on the IS-Registry, and then starts producing its dynamic information via the local IS-IP. In parallel, the IS-Cache takes care of maintaining the set of minimal information needed by locally hosted services for both publishing and querying. The IS-Registry continuously monitors the service instances, thereby maintaining an overall 'picture' of the infrastructure in line with the actual status.

As well as designing this logical organization, we are currently evaluating and comparing various caching strategies and the distribution and selection algorithms for the IS-ICs. For instance, we are investigating the use of distributed information retrieval techniques like CORI.

Next Steps
The viability of the proposed approach will be further tested in the context of the forthcoming IST project: 'Digital Repository Infrastructure Vision for European Research - DRIVER'. The objective of DRIVER is to build a testbed for a future knowledge infrastructure of the European Research Area. Existing digital repositories spread over the Net will be federated, and a set of cross-repository services will be set up to provide seamless access to the DL content, regardless of which repository owns the content. Concretely, the project will start by federating 51 institutional repositories from The Netherlands, United Kingdom, Germany, France and Belgium. Each of these repositories will be considered as an element of the component-oriented infrastructure. Other components will provide digital library functionality, eg search and browse, personalized information access through recommendations, and virtual collections. In this context the Information Service will play a key role, since it will allow the other services to become aware of each other and to dynamically discover new repositories and services as they join the infrastructure.

This work would not have been possible without the help of colleagues at the NMIS Laboratory. Special thanks go to Davide Bernardini and Pasquale Pagano for their invaluable support in designing and developing this distributed and scalable Information Service.

Links:
Networked Multimedia Information System Laboratory website: http://www.isti.cnr.it/ResearchUnits/Labs/nmis-lab/
OpenDLib website: http://www.opendlib.com/
DILIGENT project website: http://www.diligentproject.org/
BELIEF project website: http://www.beliefproject.org/

Please contact:
Leonardo Candela, ISTI-CNR, Pisa, Italy
E-mail: leonardo.candela@isti.cnr.it