XSMapper: a Service-Oriented Utility for XML Schema Transformation

by Manuel Llavador and José H. Canós


A typical case of low-level interoperability, particularly frequent in the Digital Libraries world, is the federation of collections via metadata conversion. Roughly speaking, a federation consists of a number of repositories, each with its own format, which agree on a common format for metadata exchange. Any metadata record, must then be transformed into the common format before it is sent as the result of a request. In this note, we report on a solution to the metadata conversion problem based on semantic mappings. Although it was developed to federate collections in a specific project, XSMapper is domain-independent and can be used in any context where an XML schema transformation is required.

Making distributed and heterogeneous systems interoperate has been a challenge for researchers and practitioners over the last decade. The complexity of the problem has led to solutions with increasing levels of sophistication, depending on the requirements imposed by the domains of the application. Different forms of middleware represent the most general solution for achieving full interoperability, but in some cases simpler solutions can be used. This is particularly the case when the requirement for interoperability originates from the heterogeneity of (meta)data formats, as often happens in the Digital Libraries world.

Such a problem arose during the development of BibShare, an environment for bibliography management, a project funded by Microsoft Research Cambridge, that allows users to collect bibliographic references, insert citations into documents and automatically generate a document's bibliography. Unlike former tools, BibShare works with a variety of word-processing systems, and permits references to be inserted not only from personal citation collections, but also from bibliography servers available on the Internet, such as DBLP . As might be expected, each collection has its own metadata format(s). In order to unify the result sets of federated searches and return these data to the user, each record retrieved must be converted to a common format. We call this the Bibshare Bibliographic Format (BBF).

Given that XML is used to exchange data, the natural solution to the problem is to use XSL transformations between records. For a collection to be added to the BibShare Federation, the owner of the collection must create an XSL template that transforms the records to the BBF. However, writing an XSL template is not a trivial task, and any tool supporting template generation would represent a significant improvement to the federation process.
Since the problem of document transformation goes beyond the scope of Bibshare, we developed a general solution to the problem. In its most general version, this be stated as follows: given two XML Schemas S1 and S2 that represent respectively the source and target formats of a transformation, obtain as automatically as possible the XSL template that transforms S1-valid documents into S2-valid documents.

XSL template generation workflow.
XSL template generation workflow.

XML Semantic Mapper (XSMapper) solves the problem based on the definition of semantic mappings between source and target schemas, following three steps (see figure):

  1. Extraction of the concepts that are used both in source and target schemas. A concept is a term used to name different elements in an XML document. For instance, the concept 'author' is used to denote the elements representing the authors of books and articles; this means that there will be different elements 'book/author' and 'article/author', which may be translated to the same element in a target schema (eg following the LaTeX model, an element 'bibitem/author'). This step is performed automatically by the XPathInferer Web service. This service not only finds the concepts, but also their location within documents in the form of XPATH expressions. This is very important because location in the document can be a key property during the conversion process.
  2. Definition of the semantic mappings between the elements of S1 and S2. This step cannot be performed automatically, unless some ontology relating the concepts in both schemas can be used to infer them. XSMapper provides a friendly user interface for defining three kinds of mappings, namely direct, function-based and constant. Direct mappings are used to link one or more concepts of the source schema to one or more concepts of the target schema that are semantically equivalent (eg the 'author' presented above). Function-based mappings are defined in cases where it may be necessary to apply some functions to the source concepts in order to get the equivalent target elements (for instance, splitting one concept like 'author' into two concepts 'first name' and 'surname'). As we are using XSLT to transform documents, we can use the set of functions provided by XPath and XSLT to define our function-based semantic mappings. Finally, the constant mappings are used when we want to assign a constant value to a target concept.
  3. Generation of the XSL template. This task is performed automatically by the XSLGenerator Web service. An XSL template has two kinds of elements: structural elements and value-selection elements. The former build the resulting XML tree (composed of elements and their attributes), instantiating the target schema. The latter inserts the source schema values in the resulting XML text following the semantic mappings defined in step 2.
    Notice that most of the components of XSMapper are available as XML Web services, and can be used at the URLs listed below. We are working on a variety of improvements to the tool, with special emphasis on looking for ways to automate the definition of the semantic mappings that would make XML conversion a fully automated task.

Links:
Bibshare: http://www.bibshare.org
XSMapper: http://bibshare.dsic.upv.es/XSMapper.exe

Please contact:
José H. Canós, Technical University of Valencia / SpaRCIM
E-mail: jhcanos@dsic.upv.es
http://www.dsic.upv.es/~jhcanos