by Domenico Beneventano, Sonia Bergamaschi, Stefano Lodi and Claudio Sartori
Schema design and query optimization in Object-Oriented Databases can be substantially improved by taxonomic reasoning techniques. A group of researchers is working on these topics at the "Centro di Studio per l'Interazione Operatore-Calcolatore" (CIOC-CNR), Bologna, Italy.
The organization of classes (types) in an inheritance taxonomy to describe an application domain constitutes a basic modelling principle in the database area and in artificial intelligence. In the database area, the class taxonomy is built by the designer and only a few systems guarantee its consistency. In artificial intelligence, a family of knowledge representation systems, called Terminological Logics (TL) Systems, derived from the KL-ONE model, has been developed to assign a more active role to class taxonomies. Using TL, first, it is possible to specify necessary and sufficient conditions in a class description, second, the task of creating the class hierarchy can then be delegated to the system. The knowledge base designer gives a class description as a free composition of ancestor classes and of "differentiae" properties, and the system automatically classifies it, that is determines the right place for the new class in the already existing taxonomy between its most specific generalizations and its most generalized specializations. Classification is performed by the so-called taxonomic reasoner. Applying taxonomic reasoning in the database environment to traditional semantic data models has given a number of promising results for database schema design and other relevant topics, such as query processing and data recognition. The CIOC group has been working on these topics since 1987, and, since 1990, has a collaboration with DFKI (Deutsches Forschungszentrum für Künstliche Intelligenz). A general theoretical framework has been developed, which supports conceptual schema acquisition and organization by preserving coherence and minimality w.r.t. inheritance.
Taxonomic reasoning and schema design
For certain aspects, complex object data models, recently proposed in the OODB area, are more expressive than actually implemented TL languages. Furthermore, a real-world database specification always includes a set of rules, the so-called integrity constraints, to guarantee data consistency. Constraints are expressed in various ways, depending on the data model: e.g. subsets of first order logic, or inclusion dependencies and predicates on row values, or procedural methods in OO environments. Our choice is to allow a declarative specification of a set of integrity constraints at schema level, and to exploit this knowledge together with taxonomic reasoning for the different tasks of schema design and query optimization. We propose a language which extends TLs with data structures which are relevant to databases. The language allows the description of identified objects, types with conjunction, disjunction, negation, tuples, sets with cardinality constraints and path expressions. Disjunction and negation allow the representation of if-then rules. Given a schema description with constraints, the following question arises: Is the schema consistent, i.e., is there any way to populate a database which satisfies the given constraints? We provide an algorithm, based on the tableaux-calculus, to check the satisfiability of a schema.
Taxonomic reasoning and query optimization
The purpose of semantic query optimization is to use semantic knowledge (e.g. integrity constraints) to transform a query into an equivalent one that can be answered more efficiently than the original. OODBs provide a very rich type (class) system that can directly represent a subclass of integrity constraints in the database schema. By exploiting schema information such as inheritance relations between types (classes) and integrity constraints, it is possible to perform semantic query optimization. To achieve this, we propose a theoretical framework, based on taxonomic reasoning, as queries represent necessary and sufficient conditions. The framework includes the main query transformation criteria proposed in the database literature, such as predicate addition and removal. In this framework, classes are characterized by necessary conditions, as is usual in databases, and further knowledge is expressed as integrity constraint rules. Antecedent and consequent of the rules are types of the formalism. Since OODB query languages are more expressive than our formalism, we ideally introduce a separation of a query into a "clean" part, that can be represented as a type in our formalism, and a "dirty" part that goes beyond the type system expressiveness. Semantic optimization will only be performed over the clean part of a given query. This corresponds to the so-called conjunctive queries or single operand queries in OODBs and is transformed, depending on the schema and the integrity constraint rules.
On the theoretical side we are focussing on the extension of our formalism to express rules in active database environments. We are also continuing with the study of decidability and complexity issues of our taxonomic reasoning techniques. On the implementation side, we are developing a design tool for OODBs schemata and a general purpose semantic query optimization component to be used as a pre-processor of an OODBMS.