ERCIM News No.25 - April 1996 - INRIA

Authors may index their own Web Documents

by Jacques André and Hélène Richy

The main access to information from the World Wide Web is navigational. Many projects or commercial crawlers have been designed for that purpose. In particular the concept of cartography is one of the most successfull. On the other hand, many studies are concerned with automatic indexation: tools are written for extracting from (full) texts the pertinent information the reader is looking for.

Between these two approaches, structural and statistical, we propose another one, based on the traditional technique: authors have the best knowledge about the contents of their documents. They are able to give key-words summarizing their thought. However, many problems are still yet unsolved.

A first approach, using the structured document editor Grif, allowed us to produce large index tables for traditionnal paper-form books, such as Cartulaire de Saint Laurent, the first Cartular written in French during the XIV century.

Extending such tools for the Web requires a lot of improvements at various levels. Note that index is here a concept that is extended to other concepts such as bibliography, references, table of contents, etc.

From the authoring system point of view, a set of three tasks is usefull: When considering large documents, from the Web (ie from the reader) point of view, such an index is not a static document, but rather an active one that has to be updated. Many occasions require to update index documents, such as: Various updating strategies may be proposed: At INRIA-Rennes, we are working, in the context of the Thot system (Opera project/Inria), on such index manipulation. Work is in progress to implement such a system based on the second strategy (updating when index is accessed) in the frame of the Tamaya environment.

More info in the Web:

Please contact:
Jacques André or Hélène Richy - INRIA
Tel: +33 99 84 71 00 or

return to the contents page