ERCIM News No.41 - April 2000 [contents]

Using XML for Data Management

by Victoria Marshall, Brian Matthews and Kevin O’Neill

Recognising the limitations of HTML for the representation, discovery and exchange of structured data, the World Wide Web Consortium (W3C) has developed an alternative, XML, together with a set of related standards. This allows user communities to define their own formats for structured data whilst allowing easy and open integration with other XML formats. XML has been widely taken up as a potential enabler of a wide variety of applications.

XML is an open and simple format for representing structured data tailored to the Web, and as each user community can provide its own format, there is no loss in semantics. Many user communities have been developing XML formats for the exchange of information including MathML for mathematics, Chemical Markup Language (CML) for chemistry, Bioinformatic Sequence Markup Language (BSML) for the human genome project, Extensible Scientific Interchange Language (XSIL), DDI for Social Science Data, and Schematic Vector Graphics (SVG) for diagrams.

The latter has been developed by ISE at RAL working with W3C (see article on page 24). SVG uses graphical primitives such as boxes, lines, and circles which can be integrated with other XML formats. An SVG graphic fragment can be embedded within an HTML document, and can itself have labels which are of any other XML format. Thus graphics can be described in a flexible yet compact format compared with, say, binary formats such as GIF or JPEG.

Common Interfaces to Data Sources

XML can define schema formats for classes of documents. These XML schemas can be used to define common interfaces to live data sources, independently of the concrete representation of the repository. This raises issues in using the hierarchical structure of XML, essentially derived from the structured document community, to represent relational database semantics. XML is a relatively poor language for expressing concepts such as primary keys or constraints. Nevertheless, the common interface layer it offers means that it can be a powerful mechanism for exchanging data between heterogeneous data sources.

Different User Views

XML is a purely structural format; there is no semantics provided for defining its appearance. This separation of structure and presentation is an important feature of XML: the data may not be displayed at all, but passed straight to some program, or target database, so presentation information may be superfluous. Also, by providing presentational information by means of a stylesheet, different views can be presented on the same data, for example, data in trees, tables, graphs, or charts. These views can be tailored to the user requirements. Using the new W3C recommendation XSL, a high degree of interactivity with the data can be provided on the client, without requiring further interaction with the server.


ISE has been experimenting with providing different views on the laboratory personnel database (see figure 1). This allows the user to explore the structure of the organisation without downloading further date from the server. The same data set also can be used to generate alternative views by simply changing the colours and interface, or by combining the data with SVG to, say, provide an annotated map of the laboratory offices (see figure 2).


Please contact:

Victoria Marshall - CLRC
Tel: +44 1235 44 6799
E-mail: V.A.Marshall@rl.ac.uk