< Contents ERCIM News No. 51, October 2002

A Few Words about the Semantic Web and its Development in the ERCIM Institutes

by Jérôme Euzenat

A first illustration of the direction of the Semantic Web consists in noting that the Web as it currently exists is very difficult to search. For instance, looking for a book about Agatha Christie is not easy, since many current search engines discard the word 'about' as meaningless, and may then return a plethora of pages referring to books by Agatha Christie. If we want machines to perform effective document searches, we must help them a bit by telling them what documents mean. The correct identification of relational information (such as relating a Book object to a Person object named 'Agatha Christie' using the 'about' or 'author' link) would be a step towards a more accurate search. It is thus natural that the first language made for the Semantic Web, RDF (Resource Description Framework), emphasises these relationships.

The idea of the Semantic Web (a term coined by Tim Berners-Lee, the creator of the Web) involves annotating documents with 'semantic markup', that is, markup that is not interpreted for display but rather as an expression of document content. This is often described as 'a Web for machines' as opposed to a Web to be read by humans. This idea was experimented with on a small scale during the nineties, both by the SHOE system, developed at the University of Maryland, and Ontobroker, developed at the University of Karlsruhe.

The development of the Semantic Web took off with the DARPA Agent Markup Language (DAML) initiative in the USA. Its goal was to provide the semantic markup language successor to SHOE. Shortly after, European researchers created the Ontoweb thematic network in order to federate the research in these fields, and W3C launched its long-awaited Semantic Web activity that took over the development of RDF and its extensions. These had previously been advanced by the two other groups. Research activities for building the Semantic Web are now active all over the world, and are central to the 'knowledge technologies' area of the European Union's 6th framework program. We invited several major players in Semantic Web research (W3C, the European Commission, INTAP and the Ontoweb network) to present their current and future activities in the domain.

The applications of the Semantic Web are limited only by our imagination. A more elaborate scenario for the Semantic Web involves not only the facility for improved searching, but also for selecting, assembling and triggering services found on the Web, matching interest profiles and resource descriptions, automatically rescheduling and reassembling services upon an unexpected event, notifying customers of a schedule change on their favourite communication medium or mining biomolecular databanks at night.

For instance, the 'travel agent scenario' deals with a software agent able to plan a complex trip involving several forms of transportation, hotel stays, and conference and entertainment registration by resolving heterogeneous constraints. These could include using trusted hotels in preferred areas, taking eating requirements into account, finding connections, using only prescribed airlines and frequent-flyer-affiliated car rental, and minimising the total cost of the trip. Such a scenario necessitates four main capabilities on the part of the agent:

a network of well-annotated resources, eg plane schedules and fares, hotel locations and facility descriptions, etc
a reservoir of available knowledge (or an ontology), eg a bus is a means of transportation, Sardinia is in Italy, etc
a description of user preferences, eg food preferences, accommodation needs, frequent flyer programs, agendas, etc
inference capabilities, eg taxonomical inference, temporal inference, trust propagation, etc.

In short, this scenario involves doing what a very good assistant could do, but in a more systematic and provable way (though perhaps in a less flexible and agreeable manner). However, the Semantic Web should not be confused with what has sometimes been called 'strong artificial intelligence' for three reasons. Firstly, there is no anthropomorphic claim to the Semantic Web - instead it is intended to complement humans in areas where they do not perform very well (ie, dealing quickly with large amounts of information, working continuously, analysing large texts for certain pieces of information, etc). Secondly, it takes into account many lessons learnt from the Web; that is, being large, inconsistent and spread, Semantic Web applications must be scalable, robust and decentralised. Lastly, it must be adapted to the context in which it will evolve.

The research aspects of the Semantic Web were debated at a recent ERCIM strategic workshop that was held in Sophia-Antipolis last year. It identified several important research directions to be investigated:

identification and localisation, for the correct and universal addressing of resources
relationships between semantic models, for adequately transforming and interpreting the content of the Semantic Web
tolerant and safe reasoning, for overcoming the inherent heterogeneity and inconsistency of an open world, and
facilitating Semantic Web adoption, by imagining original knowledge-capture means and adequate growth models.

From this, it is evident that a number of fields can contribute to the development of the Semantic Web. Many groups are now applying their knowledge at various levels, from how to put category information on Web pages to how computers can be related to meaning. As a matter of fact, one of the good features of the Semantic Web is that it has encouraged the close cooperation of many people from different backgrounds. We can already see structured markup developers talking logic with knowledge representation researchers, database engineers talking protocols with multimedia designers, electronic commerce managers starting to talk to agent developers. Of course, the Semantic Web primarily takes advantage of the techniques developed on the Web and in knowledge representation, but other fields also have important contributions to make. These include document management for manipulating complex markups, databases for dealing efficiently with assertions, digital libraries for metadata and search schemes, logics for crafting language semantics and reasoning systems, interface design for offering new views of these marked documents, and agents for gathering knowledge and negotiating over the network.

This variety of topics in research and experimentation is well illustrated by the work presented in this special issue. The ERCIM people are working hard on the development of the Semantic Web, and we have organised their presentations into three sections:

'Tools and Experiments' gathers projects that are developing tools for manipulating and experimenting with semantic markup
'Metadata, Ontologies and Information Retrieval' is dedicated to projects on the design and alignment of vocabularies for use in marking up and retrieving documents and products
'Multimedia and Adaptation' deals with the modular design of multimedia documents, and the appeal they hold in terms of individual adaptation and retrieval as well as profiled assembly.

We are now at a moment in which the players are mobilised in order to provide the best languages, architectures and tools for experimenting with the concepts of the Semantic Web. In this special issue, you have the opportunity to read all about it.

Links:
Semanticweb.org: http://www.semanticweb.org
W3C/SW activity: http://www.w3.org/2001/sw/
Ontoweb: http://www.ontoweb.org
DAML: http://www.daml.org
The International Semantic Web Conferences: http://iswc.semanticweb.org/

Further reading:
Tim Berners-Lee, James Hendler, Ora Lassila,
The Semantic Seb, Scientific American 284(5):35-43, 2001,
http://www.scientificamerican.com/2001/0501issue/0501berners-lee.html

Isabel Cruz, Stefan Decker, Jérôme Euzenat, Deborah
McGuinness (eds.), The Emerging Semantic Web: Selected Papers from the First Semantic Web Working Symposium, IOS press, Amsterdam, 2002, 300pp,
http://www.inrialpes.fr/exmo/papers/emerging/

Ian Horrocks, James Hendler (eds.),
The Semantic Web - ISWC 2002, Lecture Notes in Computer Science 2342, Springer-Verlag, 2002
http://link.springer.de/link/service/series/0558/tocs/t2342.htm

Dieter Fensel, James Hendler, Henry Lieberman, Wolfgang Wahlster (eds.),
Spinning the Semantic Web, The MIT press, Cambridge, 2002, to appear

Jérôme Euzenat (ed.), Research Challenges and Perspectives of the Semantic Web,
http://www.ercim.eu/EUNSF/semweb.html

Please contact:
Jérôme Euzenat, INRIA
Tel: +33 476 61 53 66
E-mail: Jerome.Euzenat@inrialpes.fr