WEB TECHNOLOGIES
ERCIM News No.41 - April 2000 [contents]

ICS-VRP: a Tool for Parsing and Validating RDF Metadata & Schemas

by Karsten Tolle and Vassilis Christophides

The Web provides a simple and universal infrastructure to exchange various kinds of information. In order to share, interpret, and manipulate information worldwide, the role of metadata is widely recognized. Indeed, metadata allow us to easily locate information available in the Web, by providing descriptions about the structure and the content of the various Web resources (eg data, documents, images, etc.) and for different purposes. The emergence of the Resource Description Framework (RDF) is expected to enable metadata interoperability across different communities or applications by supporting common conventions about metadata syntax, structure, and semantics.

More precisely, it provides a) a Standard Representation Language for Web metadata; and b) a Schema Definition Language (RDFS) to interpret (meta)data using specific class and property hierarchies (ie vocabularies). Moreover, RDF/RDFS offer a syntax for representing metadata and schemas in XML, enabling the creation and exchange of RDF descriptions in a both human readable and machine understandable form. Many information providers like ABC News, CNN and Time Inc., Web portals like Open Directory as well as Web browsers like Netscape, and search engines like Altavista, Yahoo and Webcrawler already support the RDF proposal. Unfortunately, existing RDF parsers (eg, SiRPAC) check only the well-formedness of RDF resource descriptions according the W3C RDF M&S specifications. For this reason, we have developed the ICS-FORTH Validating RDF Parser (VRP) allowing the validation of RDF resource descriptions against the associated RDFS schemas, as well as of the schemas themselves.

RDF is based on a directed graph model that alludes to the semantics of resource description. The basic idea is that a Resource (identified by a URI) can be described through a collection of Statements forming a so-called RDF Description. A specific resource together with a named property and its value is an RDF statement. RDFS schemas are then used to declare vocabularies, ie collections of classes and properties, that can be used in resource descriptions for a specific purpose or domain. VRP is a tool to analyze, validate and process RDF descriptions based on standard compiler generator tools for Java, namely CUP/JFlex (similar to YACC/LEX). As a result, users do not need to install additional programs (eg, XML Parsers) in order to run VRP while they can easily update or extend the VRP BNF grammar in case of changes in the RDF/RDFS specifications. VRP is a 100% Java(tm) development understanding embedded RDF in HTML or XML and providing full Unicode support. The quick LALR grammar parser (ie CUP) as well as the stream-based parsing support (ie JFlex) ensure good performance during the processing of large volumes of RDF descriptions.

The most distinctive feature of VRP is its ability to verify the constraints specified in the RDF Schema specification. This allows the validation of both the RDF descriptions against one or more RDFS schemas, and the schemas themselves. The VRP validation module relies on (a) a complete and sound algorithm to translate descriptions from an RDF/XML form (using both the Basic and Compact serialization syntax) into the RDF core model (ie triples) (b) an implementation of this model in Java to efficiently verify the RDFS constraints.

To favour metadata reusability, RDF supports a) the sharability of RDFS schemas using the XML namespace mechanism (ie provide only incremental modifications to a base schema in order to create a new variant); and b) the creation of RDF (meta)data using multiple schemas at the same time (ie merging different types of metadata). This implies to maintain for real scale applications, several interconnected RDF schemas than can be potentially used to describe Web resources. To meat these requirements VRP supports validation across several namespaces: we can connect to remote namespaces in order to import the external statements we need to validate our RDF descriptions. Note that the RDF and RDFS namespaces are necessary for every RDF description and therefore their statements are by default included into VRP.

Currently VRP provides a command line interface with various options to generate a textual representation of the internal model (either graph or triple based). In the future, we plan to implement a graphical user interface to visualize the analyzed RDF statements as well as to interact with VRP during parsing/ validation. Finally, there is an ongoing effort to develop VRP APIs in order to facilitate the integration of the program into other systems (eg on-line loaders to DBMS).

Links:
ICS-VRP website: http://www.ics-forth.gr/proj/isst/RDF/

Please contact:
Vassilis Christophides - FORTH-ICS
Tel: +30 81 391628
E-mail: christop@ics.forth.gr

Karsten Tolle - Universität Hannover
E-mail: storr@t-online.de