ERCIM News No.35 - October 1998

XML and the World-Wide Web Consortium Leverage Action Project

by Brian Matthews

The World-Wide Web is based on some very simple technologies. In particular, the Hypertext Markup Language (HTML), is a simple language for describing documents. However, HTML is severely limited as a information management medium. HTML’s mix of structure and presentation means that reformatting the data to give different views is hard. Further, the lack of domain specific data modelling in HTML has made accurate searching for information on the Web difficult and has made it hard to interact with databases. Thus the very features which led to the widespread acceptance of HTML are limiting the utility of the Web itself.

In response the World-Wide Web Consortium (W3C) has developed the Extensible Markup Language (XML) ( XML is not intended as a replacement of HTML, but rather as a more flexible alternative for the representation of data across the Web. XML is intended to allow new data formats to be defined while maintaining the universality of HTML.

XML is based on the existing Standard Generalised Markup Language (SGML). The key concept brought to XML from SGML is that of a Document Type Definition (DTD). This is a declaration of the correct markup structure for a class of XML documents against which documents can be validated. Thus the logical structure of a class of valid documents is defined and used by applications to manipulate a document.

Thus XML can be used to generate new document markup which is closer to the intended use of the document in a flexible yet universally interpretable way. DTDs can then be given for a wide variety of application domains and data formats.

The W3C-LA project between INRIA and RAL and also the W3C offices at SICS, GMD, CWI, and FORTH, has been exploring the use of XML within several different demonstrators:

The common driving force behind these initiatives is the desire to transmit and present new kinds of information across the WWW in a flexible and open way. They also demonstrate the widely differing application domains offered by XML, and its potential to enhance the capacity of the WWW. Further information on W3C and W3C-LA activities can be found by contacting the W3C at INRIA, or the W3C offices established at RAL, SICS, GMD, CWI, and FORTH.

