WIKINGER - Semantically Enhanced Knowledge Repositories for Scientific Communities

by Lars Broecker

While many scientific communities use the Internet for the exchange of scientific knowledge, it is only rarely used for the collaborative creation of it. The WIKINGER project is working on semantically enhanced knowledge repositories that support the collaborative generation of knowledge by providing a semi-automatically generated semantic net of the topics contained. A Web application is being developed using front-end building on Wiki technology.

As the success of the Wikipedia project shows, collaborative knowledge creation on the Internet is possible, even viable. This is an interesting result, taking into account that the user base generally acts anonymously and is spread all over the world. However, disadvantages exist to the Wiki approach to knowledge creation, especially for scientific communities. First, there is the problem of attaining a critical mass. The domains are often highly specialized, leading to only small numbers of people interested in (or even qualified for) participation, which makes attaining a critical mass of information very difficult. Second, there is a problem in the way HTML handles the linking of pages. Hyperlinks are one-way only and do not carry any semantics besides 'go there from here'. This becomes a problem as soon as the need to assign semantic labels to associations arises, eg in order to enable more sophisticated search tools than full-text retrieval.

The scenario illustrated above is typical for academic communities, especially in the humanities. In general, a variety of publications exist that deal with special facets of the discipline. Each of these contains a multitude of pieces of information on people, institutions, places and events, as well as the associations between them. Unfortunately, organized knowledge repositories available in digital format are rare. This problem is recognized in the community, as the efforts necessary to find these pieces of information among the publications grow. Such information would be well suited for publication in a Wiki system, provided that a) the process of identifying articles and their relationships can be automated to a high degree, and b) that the problem of missing semantics in hyperlinks can be solved, since there are many different types of relationships that need to be expressed in hyperlinks.

The goal of the WIKINGER (Wiki Next-Generation Enhanced Repositories) project is the creation of a semantic Wiki containing both the entities relevant to the domain and the qualified associations connecting them. The main difference to other projects dealing with semantic Wikis is the level of automation. The project is working towards the semi-automatic creation of a base for the semantic Wiki from the digital repository, thus reducing the amount of work necessary to attain critical mass.

The process used by the project is shown in the Figure. The initial phase (labelled 0) can be seen as a bootstrapping phase for the system. An initial collection of digitally available data sources including publications, articles or databases is assembled and converted to a format suitable for further processing. The WIKINGER system stores both the original data as well as the derived format in a document repository. The data is then processed by a module doing Named Entity Recognition (NER, labelled 1), which gathers entities according to entity classes. A human annotator provides the module with examples for the designated classes, thus aiding the system in learning those classes. The advantage of this approach is the flexibility to include new classes: given specific examples, the system can learn to recognize them.
The output is a collection of recognized entities which serves as the input for stage 2. Stage 2 tries to identify the associations between the different entities. The result of this stage is a semantic net forming a hypothesis of the knowledge contained in the data sources. This hypothesis is evaluated by human experts, and this evaluation is used as the input for another iteration of the net-building process. This in turn is evaluated, and so the process continues. When the experts are satisfied with the results, the semantic net is deployed for use in the WIKINGER-repository.

Creation of a semantically enhanced knowledge repository.
Creation of a semantically enhanced knowledge repository.

This repository combines the functionality of a Wiki system with the expressiveness of associations found in languages of the Semantic Web. Nodes in the Semantic Net translate to articles in the Wiki; the different types of association between them form the hyperlinks connecting the articles. Since the Wiki is simply a user interface to the Semantic Net, the semantics behind the hyperlinks are retained and can be used for intelligent software assistants. The Net is kept in sync with the Wiki through use of a feedback loop that subjects all changes in the articles to the same process as the original data. This allows the identification of new topics or associations that come up in daily work with the Wiki.

The project is conducted by the University of Duisburg-Essen and the Fraunhofer Institute for Media Communication in cooperation with the Commission for Contemporary History (KFZG) in Bonn. The pilot project focuses on the domain of Contemporary History, in particular on the social and political history of German Catholicism. While still in its early stages, the first results from the project are very encouraging. At the moment we are working on a prototype offering basic functionality, which will enable users to test-drive the system early in the development cycle.

The project is funded by the German Federal Ministry of Research and Education in the program 'eScience'. Work on the project commenced in October 2005, and will be completed in September 2008.

WIKINGER project: (as of yet German only)

Please contact:
Lars Broecker, Fraunhofer-Institute for Media Communication, Germany
Tel: +49 2241 14 1993

Jochen Meyer, OFFIS, Germany
Tel: +49 441 9722 185