< Contents ERCIM News No. 51, October 2002

Semantic Characterisation of Links and Documents

by Silvia Martelli and Oreste Signore

Semantic characterisation can considerably enhance navigation possibilities and makes possible the presentation of documents for a real adaptive hypertext environment.

Documents on the Web are often deeply structured, and this can be the origin of incompatibility between different views of the same object. Links are the essence of hypertext, but they are meaningful only if their semantics is clear, and perceivable by the user. As a consequence, it can be useful to have lightly structured documents, where some elements can be seen as 'semantic items' that identify concepts chracterising the specific parts of the document. Links can share basic semantic categories with these semantic items, and can be implemented in the XLink framework. This quite simple approach leads to a considerable enhancement of navigation possibilities, and makes it possible to present documents apropriately for a real adaptive hypertext environment.

The Hyperlink Association Model
When reading, our attention is often captured by words (anchors) that lead our mind to other documents. In the Web context, documents, whatever their origin, are seen as resources. We can model the association process in the following way:

the anchor leads to a concept
the concept is related to other concepts
the new concept is related to some resources.

Figure 1: Document and Concept Space.

This basic association mechanism (see Figure 1) is totally independent of the document structuring. In the data space, documents are connected by extensional links. In the concept space - a simplified version of the Semantic Web architecture's ontology level - associations among concepts implement intensional links among documents. Two questions now arise:

how can we implement the link from resources to concepts
how can concepts be linked together.

A simple and effective way to implement intensional links is to identify the semantic items. This can help in several cases (for example, 'The French emperor' can be an implicit reference to 'Napoleon'). We can also characterise each semantic item with a specific semantic category (eg person, location, date, taxonomy) in order to tailor a document to specific user interests, eg a reader interested in space-time associations will get location and date items emphasised.

The second question directly leads to the interaction metaphor issue. In addition to the case of taxonomic classifications, where we can make use of well-known thesaurus techniques, space and time can function as very powerful association mechanisms. A semantic item can point to a location, then, using an interaction metaphor based upon space, the user can either jump to other resources linked to the same location, or select a different location, and then find other resources related to this new location. This simple hyperlink association model can be implemented through a document, link and user model.

Semantic Model of Documents
In XML documents we must clearly distinguish between structural and semantic information, which can be associated with elements or parts of them. For the sake of simplicity and effectiveness, it is possible to define a limited and rough general set of semantic categories, that can be structured in a thesaurus-like fashion. These categories can be shared by a wide variety of users, can be used to define a user profile and can semantically characterise various parts of the documents and links. We can also specify a weight, stating the relevance of the concept in the document context.

Link Taxonomy and Model
Links allow navigation on the Web, and can implement the abstraction mechanisms needed to move from data space to concept space. Semantic qualifications of links explicitly identify their meaning in the document and the role of involved resources. The reason why the link has been inserted in the document, ie the nature of association (geographical, explicative, etc), can be explicated through the link's semantic type. Different types of links can suggest different and specialised interaction paradigms (time, map, classification, etc) having an enormous effect on the potential association mechanism: two documents can be linked through an intensional link existing in the concept space, even without the extensional link being specified in the document.

User Model
As a first approximation level, the user model is defined in terms of an essentially dynamic profile, tightly related to the semantic model of documents and links. The user profile is defined in terms of semantic categories, link types and link roles. For each of these, a degree of interest (weight) is stated.
A Simple Example

Take the following fragment of an XML document, containing some semantic items:
In <abwr:si st="date"> 1812 </abwr:si> <abwr:si st="person" canonicalName = "Napoleon"> the French emperor </abwr:si> invaded Russia ...
This document may have been entered as it is, or may be the result of a more complex process, involving a database search and data processing.

Figure 2: Document with an activated semantic item.

A software agent can take this document as input, producing a richer XML document, where the expression "the French emperor" becomes the anchor of an extended link (using XLink terminology).

When producing this document and also when displaying it, the software agent can examine the user profile, in order to produce a personalised document that will be displayed by the browser, as shown in Figure 2.

Please contact:
Oreste Signore, ISTI-CNR
Tel: +39 050 3152995
E-mail: Oreste.Signore@cnuce.cnr.it