ERCIM News No.34 - July 1998

Eleventh ERCIM Database Research Group Workshop: Metadata for Web Databases

by Brian J Read

The latest in the series of ERCIM Database Research Group workshops was held at GMD Birlinghoven Castle near Bonn on 26 May 1998. The topic 'Metadata for Web Databases' attracted significant interest among members of the working group, resulting in a very full day of presentations and discussion. Karl Aberer of GMD-IPSI organised and chaired the workshop. There were sixteen participants from ten institutes.

In the context of Web data management, database systems are mostly used in an isolated way as data sinks or sources. Data management services that exploit and support the connectivity of the Web require the interaction and co-operation of different data management components on the Web. To enable this the Web needs to be equipped with the metadata on structure and behaviour of Web data that these components require. Thus the workshop was intended to address such questions as the extraction, modelling and querying of metadata, so adding semantics to the use of web data.

Keith Jeffery (CLRC-RAL) introduced the workshop topic by presenting an overview of the nature of metadata in databases, distinguishing its various purposes, and classifying it into three main kinds: schema, navigational and associative. Capturing metadata from the web presents problems as virtual pages generated from database queries are invisible to the large web crawlers. The limitations of HTML, and indeed XML, in managing metadata were discussed in this and several subsequent talks.

Yannis Stavrakas (NTU-Athens) expanded on the nature of metadata for web-based information systems. He distinguished three perspectives corresponding to the atomic level (information within a page or document), the local level (the structure of a site and links between documents), and the global information space of the whole web.

Terje Brasethvik (IDI/NTNU-Trondheim), currently in Paris, described his work with Arne Sølvberg on a Referent Model of Documents classified by semantic metadata. In this approach to sharing information on the web, they are developing a modelling language and editor to capture the meaning of documents.

Giuseppe Sindoni (Rome III University), currently visiting RAL, presented work from Paolo Atzeni's Rome group on a logical model for metadata in web bases. Their Araneus Data Model with the Penelope language embeds the schema within HTML. Turning to XML is potentially attractive, but that too has limitations for data modelling.

Three research projects were covered in the afternoon session. Menzo Windhouwer (CWI) described the work with Martin Kersten on the Acoi project. This is developing a feature detector engine to classify multimedia objects, especially images. The Acoi web robot has already stored in a database details extracted from over two hundred thousand images.

Thomas Klement (GMD-IPSI) spoke about the ICE (Information Catalogue Environment) project. This concerns metadata for multidimensional categorisation and navigation support on multimedia documents. It includes an interesting use of dynamic menus to explore hypercube structures stored in an object-relational database.

The last presentation was from Donatella Castelli (CNR-Pisa) about supporting retrieval by "relation among documents" in the ERCIM Technical Reference Library (ETRDL) based on the Dienst system and the Dublin core. This provided an interesting discussion on the possible semantics of a relationship defined between documents.

The workshop concluded with a lively panel and discussion session on the future research direction of EDRG and also its role in the EC Fifth Framework Programme. A relevant component of the latter is "Creating a User Friendly Information Society", especially Key Actions relating to application domains. This suggested that future workshops might be targeted towards an application area (such as transport, environment or health) instead of a technical topic. CWI emphasised semantic indexing of the web, in particular by involving the end user, in an ambitious research agenda and cautioned against being too much influenced by funding considerations.

