ERCIM News No.35 - October 1998
Semantic Querying of Scientific Data through a Context Meta-data Database
by Epaminondas Kapetanios and Moira C. Norrie
Posing queries to scientific data addresses many problems originated in the difficulties of understanding the data models and/or values which usually refer to particular value domains and/or might be expressed in specific measurement units. Besides, these values have also behaviour captured by statistical descriptive values. All these issues are not expressed in query languages relying on well-established algebras such as relation or collection algebra, which operate over a known schema. Therefore, we elaborated a context meta-data database which helps the user to pose semantically enriched queries through navigation of semantic information spaces. This kind of queries can be transformed into database specific query languages addressing database specific schemas.
The development of the context meta-data base has been guided by case studies concerning scientific data for avalanche prediction, which are mainly measurement data, and nominal or categorical data for quality management in medicine. The first case study is being funded by the Swiss National Science Foundation in cooperation with the Swiss Federal Institute for Avalanche Research (http://www.slf.ch/slf.html), whereas the second case study has been funded by the Institute for Social and Preventive Medicine of the University of Zurich.
Orientation and Techniques
Semantics are usually ignored in traditional query languages because they are mainly designed to operate over a known schema on the basis of a well-established algebra such as a relation or collection algebra. They are not intended to express the meaning of a query within particular contexts. Since we want to support user querying without a detailed knowledge of schema and/or data values by making use not only of the structural issues but also of the semantic ones, we add a context meta-data database to the information system architecture.
Since our aim is to enable only meaningful queries for scientific data, knowledge about the schema and data value domains must be made explicit and incorporated in the intended query. It is not only the interpretation of schema elements such as relations or attributes, but also their classification according to the semantics of arithmetic operations, eg an attribute can be classified either as a measurement variable or as a categorical one. Furthermore, finite value sets such as [-50, 50] and not infinite value domains like integer or real expressed in specific measurement units should be taken into account. The same holds for categorical data where the value domain usually is a finite set of string values such as positive, negative, indeterminate, not done, which might also be encoded by numerical values. Moreover, behaviour of values can be captured by statistical descriptive values.
This knowledge has to be addressed when only meaningful queries should be composed. The context meta-data (knowledge) base represents this knowledge in terms of semantic information spaces. Therefore, database specific queries are replaced with semantically enriched queries which can be implicitly formulated by navigating through semantic information spaces. Each selected information space can be translated to an underlying database specific query language. This kind of constructing meaningful queries also frees the end user from the need to learn the syntax of a particular query language.
The context meta-data database is being developed with respect to knowledge representation issues. Posing a query towards scientific data is done interactively through a graphical user interface for the presentation of semantic information spaces made out of elements for intended queries. These elements are represented as information objects within the context meta-data database. They stand for semantically rich descriptions of a data model and particular value domains and are semantically associated to each other. The underlying representational formalism for representing semantical information spaces is that of a multi-layered directed cyclic graph, where nodes and links are classified at various semantic levels. The system is being implemented with an object-oriented DBMS, OMS, which supports both aspects:
- rich classification constraints for both unary collections of objects and binary associations,
- a directed association construct since it relies on an object-oriented model, OM.
The knowledge elements are expressed in natural language and are mainly classified, at a first level, into subsets called Concepts, Properties, Value domains, Measurement units and Descriptive values. Directed binary associations hold among these elements which can also be self-referential - recursive definitions. Knowledge elements and their associations at different specification levels specify the semantic information spaces. For example, in case of avalanche related data values, a semantic query can be expressed as set of nodes and links to be navigated as shown in the figure.
At the moment, we are implementing a historical database for the collection of categorical data for quality management in medicine. The data will be collected from various clinical and/or therapeutical institutions in Switzerland. Besides, transformation mediators for semantic queries are being implemented for historical databases (SQL engines) with measurement data for physical experiments concerning avalanche research. A collaboration with the Institute for Theoretical Computer Science, at EPFL Lausanne, will illustrate the interfacing possibilities of the context meta-data database and SGML derivatives for a dynamical construction of semantically enriched web documents.
and M.C. Norrie - SGFI / Swiss Federal Institute of Technology (ETH) Zurich
Tel: +41 1 63 27261 (27242)