ERCIM News No.45 - April 2001 [contents]
The DataGrid Project
by Robin Middleton and David Boyd
The DataGrid project is a large European collaboration, supported by the EU, to develop a pan-European Grid infrastructure linking the various science Grids of the participants and to demonstrate its utility through demanding scientific applications. With 21 partners and a challenging technical agenda, the project is calling on the expertise of several ERCIM partners to help achieve its goals.
The LHC Challenge
The initial driver behind development of the DataGrid project was the recognition that the scale of computing and data management required by the particle physics experiments on the Large Hadron Collider (LHC), which will become operational at CERN near Geneva in 2005, far exceeded the capability or capacity of existing resources. With data volumes of several petabytes per year from all the experiments on the LHC feeding into the global particle physics community, who will then progressively reconstruct, filter and analyse the data, the aggregate computational and data throughput required is massive.
The DataGrid solution
To address this challenge, and similar requirements in other sciences, the DataGrid project was conceived and funded to the extent of approximately 10Meuro through the EU 5th Framework Information Society Technologies programme. In addition to trying to solve the problems of Europe's scientists, this 3 year project will have a wider remit to develop and prove a technological infrastructure which could potentially revolutionise commercial and social activities throughout Europe.
The DataGrid project logically has 4 major components:
- the underlying fabric of computational, data and communications resources required
- middleware to make these components accessible and controllable, initially based on the Globus toolkit
- management tools to monitor and control this infrastructure
- three application areas which will exercise the resulting Grid environment, namely particle physics, earth observation and bioscience.
The initial task, now underway, is to define the overall architectural vision of the project and to establish a detailed technical framework within which the project can progress.
The above components have been further broken down into workpackages and these are currently defining their detailed work programmes and assigning tasks to the partners. The figure shows these workpackages and indicates the relationships between them.
- WP1 - Workload Management will address distributed scheduling and resource management
- WP2 - Data Management will develop and demonstrate the necessary middleware to ensure remote access to petabyte databases and the replication and caching of data in a secure environment
- WP3 - Monitoring will produce the means for users and managers to monitor and optimise performance
- WP4 - Fabric Management will develop new automated system management techniques to support the deployment and operation of tens of thousands of commodity processors
- WP 5 - Mass Storage Management will agree and implement interfaces to mass storage systems in use within the partners
- WP6 - Integration Testbed will evaluate effectiveness of the integrated DataGrid architecture for production use across European networks and provide a platform for computation by the applications
- WP7 - Networking Services will oversee the networking aspects of the project
- WP8 (High Energy Physics), WP9 (Earth Observation) and WP10 (Biology) will build on the framework created by the other workpackages to demonstrate use of the DataGrid environment.
There are 6 main partners: CERN, ESA, PPARC (UK), CNRS (France), INFN (Italy) and NIKHEF (The Netherlands) with CERN as the co-ordinating partner and 15 associated partners from 10 countries across Europe. The industrial partners are IBM, Datamat and Compagnie des Signaux. They will be contributing their technical expertise and commercial experience and addressing the issue of how to effectively disseminate the new technology developed by the project into the marketplace so that European society and business can also benefit from these advances initially driven by science.
The project will be working very closely with several groups in the US including the Globus team, the Grid Physics Network (GriPhyN) and the Particle Physics Data Group (PPDG). By sharing technology and agreeing on joint development programmes, the resources of all these partners can be brought together most effectively to tackle what must be the largest global computing challenge ever undertaken.
DataGrid web site: http://www.datagrid.cnr.it/
Robin Middleton - CLRC
Tel: +44 1235 446348