ERCIM News No.41 - April 2000 [contents]

Escrire: Embedded Structured Content Representation in Repositories

by Jérôme Euzenat, Rose Dieng and Amedeo Napoli

Content representation seems unavoidable in some areas of the future web. Although there are many candidates as content representation languages, their respective merits in this context are not yet known.

An Intranet or, more generally, the use of the Internet technology, is an opportunity for companies, to publish and share knowledge often difficult to reach in documentary form. The numerical and digitised documents can be made available in a standard and transparent way to all the users concerned. The ambition, in the long term, is to produce knowledge servers allowing the search and the handling of the corporate resources. However, the limits of this approach appear quickly: the organisation and maintenance of the sites appears an expensive task and full-text search is not very effective.

The representation of the content of documents is becoming a necessity. Content representation allows to manipulate content, to make search by analogy, by specialisation, similarity, etc. XML enables to insert content representation (through RDF) within the documents (in XHTML or other XML format) and knowledge representation formalisms are good candidates for representing content.

ESCRIRE is a coordinated action of three INRIA teams (Acacia, Exmo, Orpailleur). One of its first objectives is to propose an implementation model for this. But there are various knowledge representation formalisms and their respective qualities (compared to each other) are not exactly known. The main goal of ESCRIRE thus consists of comparing three types of knowledge representation formalisms (conceptual graphs, object-based knowledge representations and description logics) from the standpoint of the representation and the handling of document content. That will enable to highlight the desired properties and to evaluate qualitatively and quantitatively the performances of the implementation formalisms. Beyond the better knowledge of the techniques implied by the various projects, this work will advance the state of the art.

A set of documents and a set of queries have been selected in a coordinated way. The efficiency of the respective formalisms will be assessed with regard to these queries. Each team specifies the integration of the formalism in XML and develops a query evaluation strategy depending on the considered formalism. The formalisms together with full-text search will be evaluated along a predefined protocol for assessing qualitative criteria (eg query expressivity, legibility) and quantitative ones (eg precision, recall). This evaluation will provide a precise analysis of the strengths and weaknesses of each knowledge representation formalism of content representation.


Please contact:
Jérôme Euzenat - INRIA
Tel: +33 4 7661 5366
E-mail: Jerome.Euzenat@inrialpes.fr