ERCIM News No.35 - October 1998

SARI - A System for Semantical Information Retrieval

by Kuldar Taveter

TILA (Tools for Information Retrieval and Organization) is a project of the Finnish National Multimedia Research Programme whose goal is to design and develop an agent-based system for retrieval and organization of heterogeneous information that can be in different forms and lie in different locations. The SARI (Software Agents for Retrieval of Information) system is intended to act as a broker between human users or other computerized systems (ie applications) needing information at one end, and heterogeneous information sources with different search engines at the other.

SARI’s architecture reflects the system’s role as a broker between its users and information sources. In the above figure SARI’s agents of the following types are depicted:

In addition, there are also Content Provider Agents that represent content providers to the SARI system. Content providers are organizations or individuals who own one or more information sources that are accessible to the SARI system. Content Provider Agents take for example care of mediating metadata about the information in its information sources to SARI.

Control Agents form the heart of SARI. They make their brokering decisions on the grounds of the user information lying in user profiles, and of the metadata about the information to be retrieved lying in ontologies. Control Agents can form federations with each other, as a rule, but there is just one Control Agent in the present pilot version of SARI.

The content of any information retrieval request originating at some Application Agent is translated into the internal query language SAL (SAri query Language) before it is forwarded to the Control Agent. The query is translated into the query language of an information source by its Search Agent. In this way, for n applications and m information sources, only n+m compilers need to be built.

The conceptual structure of the information contained in the information sources available to SARI is described by ontologies. An ontology is a description of the concepts and inter-concept relationships of some problem domain. The ontologies for relational databases used by SARI are derived from their schemas. Ontology can also be a classification that the information in an information source is based on. An example of this is the APL database Ultika containing statistical information about the Finnish foreign trade which is used by SARI. Since SARI includes an implementation of the Resource Description Format (RDF) proposed by the W3 Consortium, the ontologies describing Web resources are specified as RDF schemas and descriptions for SARI. Ontologies can be graphically browsed in SARI.

One of the most important problems that has to be solved in semantical information retrieval from heterogeneous sources is to reconcile different conceptualizations of the world represented by different information sources. In SARI the concepts of different ontologies are linked to each other by making use of the notions of viewpoint and bridge. The ontologies interlinked in such a way form the ontological structure that can be viewed from different perspectives. For example, there is a bridge between the concepts Commodity and Product which are respectively the root classes of the classifications under the foreign trade and manufacturing viewpoints.

Future goals with SARI include making the formation of bridges between the concepts of different ontologies semiautomatic, and also semiautomatic generation of RDF metadata from Web resources.

The SARI system is being worked out in Finland jointly by VTT Information Technology, Tampere University of Technology, and Tampere University. The project started in March 1996, and will continue until March 1999.

Please contact:

Kuldar Taveter - VTT
Tel: +358 9 456 6044

return to the contents page