< Contents ERCIM News No. 51, October 2002

The Semantic Web lifts off

by Tim Berners-Lee and Eric Miller

Many researchers at ERCIM Institutes are aware that this is an exciting time to be involved in work done at the World Wide Web Consortium (W3C). Scalable Vector Graphics, Web Services, and the Semantic Web are but a few of the W3C Activities attracting media attention. This article focuses on the W3C's Semantic Web Activity and recent developments in the Semantic Web community. Although it is difficult to predict the impact of such a far-reaching technology, current implementation and signs of adoption are encouraging and developments in future research areas are extremely promising.

The Semantic Web is an extension of the current Web in which information is given well-defined meaning, enabling computers and people to work in better cooperation. The W3C Semantic Web Activity, in collaboration with a large number of researchers and industrial partners, is tasked with defining standards and technologies that allow data on the Web to be defined and linked in a way that it can be used for more effective discovery, automation, integration, and reuse across applications. The Web will reach its full potential when it becomes an environment where data can be shared and processed by automated tools as well as by people.

How might this be useful? Suppose you want to compare the price and choice of flower bulbs that grow best in your zip code, or you want to search online catalogs from different manufactures for equivalent replacement parts for a Volvo 740. The raw information that may answer these questions, may indeed be on the Web, but it's not in a machine-usable form. You still need a person to discern the meaning of the information and its relevance to your needs.

The Semantic Web addresses this problem in two ways. First, it will enable communities to expose their data so that a program doesn't have to strip the formatting, pictures and ads from a Web page to guess at the relevant bits of information. Secondly, it will allow people to write (or generate) files which explain - to a machine - the relationship between different sets of data. For example, one will be able to make a 'semantic link' between a database with a 'zip-code' column and a form with a 'zip' field that they actually mean the same thing. This will allow machines to follow links and facilitate the integration of data from many different sources.

This notion of being able to semantically link various resources (documents, images, people, concepts, etc) is an important one. With this we can begin to move from the current Web of simple hyperlinks to a more expressive semantically rich Web, a Web where we can incrementally add meaning and express a whole new set of relationships (hasLocation, worksFor, isAuthorOf, hasSubjectOf, dependsOn, etc) among resources, making explicit the particular contextual relationships that are implicit in the current Web. This will open new doors for effective information integration, management and automated services.

How is it being developed?
There are two places to look for Semantic Web progress: from the ground up, in the infrastructural and architectural work coordinated by the W3C, and from top down, in application-specific work by those leveraging Semantic Web technologies in various demonstrations, applications and products. This article provides an introduction to both views with a specific focus on those areas in which the W3C is directly involved.

Enabling Standards
Uniform Resource Identifiers (URIs) are a fundamental component of the current Web and are a foundation of the Semantic Web. The Extensible Markup Language (XML) is also a fundamental component for supporting the Semantic Web. XML provides an interoperable syntactical foundation upon which the more important issue of representing relationships and meaning can be built. URIs provide the ability for uniquely identifying resources as well as relationships among resources. The Resource Description Framework (RDF) family of standards leverages URIs and XML to provide an stepwise set of functionality to represent these relationships and meaning.

The W3C Semantic Web Activity's charter is to serve a leadership role in the design of specifications and the open, collaborative development of technologies focused on representing relationships and meaning. The base level of the RDF family of standards is a W3C Recommendation. The RDF Core Working Group is in the process of formalizing the original RDF Model and Syntax Recommendation which provides a simple yet powerful framework for representing information in the Web. Building on this work, the group is additionally defining a simple means for declaring RDF Vocabularies. RDF Vocabularies are descriptive terms (eg Service, Book, Image, title, description, rights, etc) that are useful to communities recoding information in a way that enables effective reuse, integration and aggregation of data. Additional deliverables include a precise semantic theory of these standards that will support future work, as well as a primer designed to provide the reader with a basic understanding of RDF and its application.

Simple data integration, aggregation and interoperability are enabled by these base level RDF standards. An increasing need for interoperability at a more expressive descriptive level is also desired. The Web Ontology Working Group is charted to build upon the RDF Core work a language for defining structured, Web-based ontologies. Ontologies can be used by automated tools to power advanced services such as more accurate Web search, intelligent software agents and knowledge management. Web portals, corporate website management, intelligent agents and ubiquitous computing are just some of the identified scenarios that helped shaped the requirements for this work.

Advanced Development
Just as the early development of the Web depended on code modules such as libwww, W3C is devoting resources to the creation and distribution of similar core components that will form the basis for the Semantic Web. The W3C Semantic Web Advanced Development (SWAD) initiatives are designed to work in collaboration with a large number of researchers and industrial partners and stimulate complementary areas of development that will help facilitate the deployment and and future standards work associated with the Semantic Web.

SWAD DAML
The purpose of the SWAD DAML project is to contribute to the development of a vibrant, ubiquitous Semantic Web by building critical Semantic Web infrastructure and demonstrating how that infrastructure can be used by working, user-oriented applications.

SWAD DAML is designed to build on the DARPA Agent Markup Language (DAML) infrastructure to provide an interchange between two or more different applications. The first involves structured information manipulation required to maintain the ongoing activities of an organization such as the W3C. These include access control, collaborative development, and meeting management. The second application is focused on the informal and often heuristic processes involved in document management in a personalized information environment. Integrated into both environments will be tools to enable authors to control terms under which personal or sensitive information is used by others, a critical feature to encourage sharing of semantic content.

SWAD-Europe
SWAD-Europe will highlight practical examples of where real value can be added to the Web through Semantic Web technologies. The focus on this initiative is on providing practical demonstrations of how the Semantic Web can address problems in areas such as: site maps, news channel syndication, thesauri, classification, topic maps, calendaring, scheduling, collaboration, annotations, quality ratings, shared bookmarks, Dublin Core for simple resource discovery, Web service description and discovery, trust and rights management and how to effectively and efficiently integrate these technologies together.

SWAD-Europe will additionally concentrate on exploratory implementation and pre-consensus design in areas such as querying, and the integration of multiple Semantic Web technologies. It shall provide valuable input and experiences to future standards work.

SWAD Simile
W3C is additionally working with HP, MIT Libraries, and MIT's Lab for Computer Science on Simile, which seeks to enhance interoperability among digital assets, schemas, metadata, and services across distributed individual, community, and institutional stores and across value chains that provide useful end-user services by drawing upon the assets, schemas, and metadata held in such stores. Simile will leverage and extend DSpace, enhancing its support for arbitrary schemas and metadata, primarily though the application of RDF and Semantic Web techniques. The project also aims to implement a digital asset dissemination architecture based upon Web standards, enabling services to operate upon relevant assets, schemas, and metadata within distributed stores.

The Simile effort will be grounded by focusing on well-defined, real-world use cases in the libraries' domain. Since parallel work is underway to deploy DSpace at a number of leading research libraries, we hope that such an approach will lead to a powerful deployment channel through which the utility and readiness of Semantic Web tools and techniques can be compellingly demonstrated in a visible and global community.

SWAD Oxygen
The MIT/LCS Oxygen project is designed to enable pervasive, human-centered computing through a combination of specific user and system technologies. Oxygen's user technologies directly address human needs. Speech and vision technologies enable us to communicate with Oxygen as if we're interacting with another person, saving much time and effort. Automation, individualized knowledge access, and collaboration technologies help us perform a wide variety of tasks in the way we like to do them. In Oxygen, these technologies enable the formation of spontaneous collaborative regions that provide support for recording, archiving, and linking fragments of meeting records to issues, summaries, keywords, and annotations.

The Semantic Web is designed to foster similar collaborative environment and the W3C is working with project Oxygen to help support this goal. The ability for "anyone to say anything about anything" is an important characteristic of the current Web and is a fundamental principle of the Semantic Web. Knowing who is making these assertions is increasingly important in trusting these descriptions and enabling a 'Web of Trust'. The Annotea advanced development project provides the basis for asserting descriptive information, comments, notes, reviews, explanations, or other types of external remarks to any resource. Together with XML digital signatures, the Annotea project will provide a test-bed for 'Web-of-Trust' Semantic Web applications.

Applications - spinning upward
Though not the focus of this article, the deployment of RDF-based technologies is increasingly significant. The W3C Semantic Web Activity hosts the RDF Interest Group, which coordinates public implementation and shares deployment experiences of these technologies. Arising out of RDF Interest Group discussions are several public issue-specific mailing lists, including RDF-based calendar and group scheduling systems, logic-based languages, queries and rules for RDF data and distributed annotation and collaboration systems. These discussion groups are designed to focus on complementary areas of interest associated with the Semantic Web Activity, each of which fosters cooperation and collaboration among individuals and organizations working on related Semantic Web technologies.

In addition to these Interest Group lists there are a variety of domain specific communities who are using RDF/XML to publish their data on the Web. These notably include the Dublin Core Metadata Initiative, the IMS Global Learning Consortium vocabularies for facilitating online distributed learning, XMLnews, PRISM, the RDF Site Summary (RSS 1.0) for supporting news syndication, Musicbrainz for cataloging and cross-referencing music, and Creative Commons for supporting a digital rights description to name but a few. The Topic Map (XTM) community has been finding increasing synergy with the RDF data model.

Early commercial adopters such as Adobe's eXtensible Metadata Platform (XMP), for example, leverage RDF/XML to enable more effective management of digital resources. Adobe applications and workflow partners through XMP can leverage the power of RDF/XML to provide a standardized means for supporting the creation, processing, and interchange of document metadata across publishing workflows. This in-turn reduces cost and makes for more effective management of digital resources possible both within and across organizational boundaries.

New Things opening up
The most exciting thing about the Semantic Web is not what we can imagine doing with it, but what we can't yet imagine it will do. Just as global indexes, and Google's algorithms were not dreamed of in the early Web days, we cannot imagine now all the new research challenges and exciting product areas which will appear once there is a Web of data to explore. Many existing fields for knowledge representation and data management have typically made assumptions regarding a conceptually or physically centralized system, and as such their application to the Semantic Web is not straightforward. Given a mass of rules relating data in different vocabularies, and an unbounded set of datasets in different vocabularies, what algorithms will efficiently resolve general queries? What conventions for the storage of tips and pointers will allow data to be reused and converted automatically? What techniques will allow a system to operate securely while processing very diverse data from untrusted agents? How can one represent - and then implement - personal privacy in such a world?

The Semantic Web starts as a simple circles-and-arrows diagram relating things, which slowly expands and coalesces to become global and vast. The Web of human-readable documents spawned a social revolution. The Semantic Web may in turn spawn a revolution in computing. In neither case did a change occur in the power of one person or one computer, but rather a dramatic change in the role they can play in the world, by being able to find out almost anything virtually immediately.

For more information on the Semantic Web, including additional projects, products, efforts and future directions check out the Semantic Web home page.

Link:
http://www.w3c.org/2001/sw/

Please contact:
Tim Berners-Lee, Director of W3C
E-mail: timbl@w3.org

Eric Miller
W3C Semantic Web Activity Lead
E-mail: em@w3.org