CASPAR and a European Infrastructure for Digital Preservation

by David Giaretta


The preservation of digitally encoded information is a difficult task, requiring long-term commitment and collaboration. CASPAR, a new EU FP6 Integrated Project, addresses this problem. Together with other major European initiatives, it will form the basis of a continent-wide preservation infrastructure, and will benefit both current and future users. CASPAR (Cultural, Artistic and Scientific knowledge for Preservation, Access and Retrieval) is an EU Integrated Project, which began in April 2006 with a budget of around 16 MEuro (8.8 MEuro from the EU).

One of the challenges currently facing modern society is the vast amount of intrinsically fragile digital information upon which it is increasingly becoming dependent. CASPAR intends to address this problem by building a pioneering framework - based on existing and emerging standards - to support the end-to-end preservation 'lifecycle' for scientific, artistic and cultural information.

The ambitious goal is to build up a common preservation framework for heterogeneous data, along with a variety of innovative applications. This will be achieved through the following high-level objectives:

The CASPAR consortium will demonstrate the validity of the CASPAR framework through heterogeneous testbeds. These will cover a wide range of disciplines from science to culture, contemporary arts and multi-media, and will provide a reliable common infrastructure that can be used or replicated in other areas.

CASPAR proposes a set of tough metrics by which it, and any other project which claims to be doing something useful for digital preservation, may be judged.

The CASPAR consortium will also seek to guarantee the future evolution of CASPAR in the following ways:

To achieve this, CASPAR brings together a consortium covering important digital holdings, with the appropriate extensive scientific (CCLRC - the lead partner and ESA), cultural (UNESCO) and creative expertise (INA, CNRS, University of Leeds, IRCAM and CIANT). This is combined with commercial partners (ACS, ASemantics, MetaWare, Engineering, and IBM/Haifa), experts in knowledge engineering (CNR and FORTH) and other leaders in the field of information preservation (University of Glasgow and University of Urbino).

Models
The Reference Model for an Open Archival Information System (OAIS, ISO 14721) which forms the basis of CASPAR contains a number of models, including a functional model (Figure 1) and an information model (Figure 2).

Figure 1: OAIS functional model
Figure 1: OAIS functional model
Figure 2: OAIS information model.
Figure 2: OAIS information model.

CASPAR adds to these a high-level model of virtualization and a number of high-level components.

The components of infrastructure that CASPAR will produce must themselves be preservable. To this end the project will put 'knowledge' at the heart of preservation. By this we mean that besides simple data semantics, CASPAR will also capture higher-level semantics. Furthermore, we will use Semantic Web techniques to enable the infrastructure components to survive changes over time.

Regardless of how successful CASPAR is as a project, it nevertheless has a limited life. In order to provide long-term support we aim to embed CASPAR results into the production processes of long-lived organizations such as CCLRC, ESA, UNESCO and INA, as well as many related archives.

In addition, the Task Force on Permanent Access to the Records of Science has produced a research programme and strategic plan, the former being consistent with that of CASPAR. Part of this strategic plan is to create an 'alliance' consisting initially of major data holders across Europe. Members of the alliance can, among other things, seek to align their individual infrastructures to form the basis of a Europe-wide preservation infrastructure. It is also hoped that a European Digital Information Infrastructure for Preservation and Access (EDIIPA) will be added to the ESFRI Roadmap, to further embed these activities.

Figure 3: CASPAR virtualization model.
Figure 3: CASPAR virtualization model.

Immediate Benefits from Digital Preservation
While many reasons exist for preserving digitally encoded information, a large proportion – such as legal requirements – are transitory. Longer-term reasons tend to be very worthy (eg for the good of future generations) but do not fare well in competition with other activities that seek support from cash-limited funders. In addition, benefits are hard to quantify.

Yet an immediate benefit can be identified, as long as the preservation is successful. The OAIS view is that the test of preservation is that digitally encoded information should remain comprehensible and useful for future users to whom that data is unfamiliar. However, potential users exist right now to whom the data is unfamiliar. Pulling current data from the Internet (eg for use in a GRID application) has many analogies with retrieving archived data, and indeed it may be hard to distinguish between the two. While it is true that for current data it may be possible to communicate with the data producer, it would be much more convenient not to rely on that but to have automated processes that can use the data correctly.

The virtualization techniques needed for preservation can in many cases provide exactly that capability. They also offer the opportunity to support generic applications that can deal with data from any source, by using the appropriate virtualization information.

We believe CASPAR to be the first project with the aim of producing broadly applicable components and a framework for digital preservation. The confluence of events at a European level offers the opportunity for CASPAR to make more than an ephemeral contribution, even when measured on the long timescales of data relevance. Furthermore, the techniques being adopted offer immediate benefits to current users.

Links:
CASPAR: http://www.casparpreserves.eu
DCC Development: http://dev.dcc.ac.uk
Task Force on Permanent Access: http://tfpa.kb.nl
OAIS Reference Model: http://public.ccsds.org/publications/archive/650x0b1.pdf
Digital Curation Centre (DCC): http://www.dcc.ac.uk
ESFRI: http://cordis.europa.eu/esfri/

Please contact:
David Giaretta, Associate Director (Development) Digital Curation Centre and co-ordinator of CASPAR project
Tel: +44 1235 446235
E-mail: d.l.giaretta@rl.ac.uk