PROBADO - Non-Textual Digital Libraries put into Practice

by Thorsten Steenweg and Ulrike Steffens

In the PROBADO project, librarians and computer scientists are collaborating to produce workflows, systems and tools that enable libraries to professionally handle non-textual documents alongside their traditional textual documents.

According to a study at the University of Berkeley, the information produced globally in 2002 amounts to 5 billion terabytes. Information on paper only represents 0.001% of new information recorded in all media and is very often simultaneously stored in digital format. Meanwhile, the relevance of non-textual digital documents is increasing. This is obvious in our private lives, where digital cameras and music downloads are becoming ubiquitous. However it is also true for professionals, such as architects who produce and combine digital 2- or 3D graphical models of monuments, musicians and composers who create and reuse digital audio recordings, or teachers who work with e-learning material. This last example also illustrates another development: digital documents are becoming more and more complex, ie they may consist of a variety of partial documents, possibly based on different types of media.
Digital libraries bring together proven expertise in typical library workflows and technical know-how on large, distributed information systems. Hence, they are also candidates for the management of complex, non-textual documents. However, today's libraries are mainly associated with the provision of literature and texts. Furthermore, they usually offer no way of taking into account a document's internal structure. For instance, it is possible to retrieve whole books, but there is no means of accessing single chapters or illustrations.

Figure 1: Acquisition of 3D content by scanning. Figure 2: Interactive 3D search interface.
Figure 1: Acquisition of 3D content by scanning. Figure 2: Interactive 3D search interface.

The PROBADO project started in February 2006 and is being conducted by the University of Bonn, the Technical University of Graz and the OFFIS Research Institute, as well as by the German National Library of Science and Technology in Hannover and the Bavarian State Library in Munich. It aims to support libraries in professionally handling non-textual, complex documents alongside their traditional text documents. The resulting information system will be the basis for a sustainable operational library service, which provides access to non-textual documents for scientists and professionals. Initially, PROBADO will provide services for music, 3D graphics and e-learning content. The underlying digital library system is, however, highly generic. Mechanisms to extend the PROBADO services to different media types will be devised in future project activities.

The challenges to be met by PROBADO can be best explained along the workflow typically implemented by a scientific library:

Although this workflow is well understood for text documents, it raises new requirements if non-textual documents are also to be managed. PROBADO users will for example expect to be able to search for 3D models of buildings with Gothic windows, for pieces of music containing a certain musical theme or melody, or for e-learning material for students in the first year. To support them PROBADO has to offer enhanced content-based indexing and retrieval methods as well as advanced, flexible user interfaces.

In the area of music, score images are analysed by Optical Music Recognition algorithms and are later synchronized with the respective audio recordings. Among other things, the user interface enables the user to type in note representations or whistle or hum a theme into a microphone. The music index is used to retrieve pieces of music matching the user's request. It can highlight the requested part within the score and synchronously play its audio interpretation.

In the area of 3D graphics, a catalogue is developed of basic architectural shapes, which are then used to index architectural 3D models. Users can then search the model database by giving a textual description like 'buildings with Doric columns', by choosing a basic shape from the catalogue and interactively parameterizing it, or by sketching the architectural shape they are interested in. Search results can be adequately rendered in a 3D browser.

In contrast, e-learning content cannot be restricted to certain media types, and semantically combines different media in ever-changing formats. Hence, PROBADO is developing extensible indexing and retrieval algorithms. These allow existing content-based retrieval methods for different media types to be integrated and enriched, enabling searching by didactic aspects.

The PROBADO project is funded by the German Research Foundation and also collaborates with the DELOS Network of Excellence, ensuring European dissemination. The project has a tentative duration of five years.

PROBADO home page:

Please contact:
Ulrike Steffens, OFFIS, Germany
Tel: +49 441 9722 176

Jochen Meyer, OFFIS, Germany
Tel: +49 441 9722 185