ERCIM News No.45 - April 2001 [contents]
e-Science and Grids at CLRC
by David Boyd and Paul Jeffreys
e-Science is enabling increasingly complex and challenging scientific problems to be addressed through the use of advanced IT. Current e-Science activities at CLRC are putting in place the groundwork for a new multi-million pound three year programme starting in April 2001. The pilot projects described here are exploring how Grid techniques can enhance the multi-disciplinary science programme supported by CLRC.
The StarGrid project, involving CLRC's Space Science and Technology and IT Departments, is integrating the Globus toolkit into the KAPPA package which is part of the Starlink astronomical data processing software suite. This enables image files to be retrieved from remote data archives using Grid techniques, processed locally and, if necessary, reloaded into the archive. The figure shows KAPPA accessing an archived image. The final goal of the project will be to integrate these facilities into the GAIA graphical user interface. This project will provide useful experience for the Astro-Grid project when it starts.
The Starlink application KAPPA accessing image data remotely using Globus software.
Earth Observation Grid
The British Atmospheric Data Centre (BADC) is located at CLRC's Rutherford Appleton Laboratory (RAL) and makes a variety of data resources available to the UK environmental research community. This project is investigating how Grid techniques can improve the services which the BADC offers its users including being able to remotely identify and retrieve data from a range of sources, if necessary with subsequent processing to reduce the volume of data which needs to be transferred over the network. The project will also review ways in which the Grid can improve UK access to data from instruments on the ENVISAT satellite to be launched by ESA in mid-2001.
Many university-based researchers who carry out experiments at CLRC's two accelerator-based facilities, the SRS at Daresbury Laboratory (DL) and ISIS at RAL, travel to the facility in order to be able to control the experiments as their data is collected. This project, which is a collaboration between teams at DL and the crystallography group at Birkbeck College in London, is studying ways in which the Grid can facilitate remote control of instruments and rapid recovery of data back to the user's university so it's quality can quickly be checked. These studies are paving the way to increase the use of automated beamlines on these facilities in the future.
HPC in the Grid
This project aims to develop a computational Grid which will enhance access to high performance computers and enable novel combinations of computer simulation, remote measurement and data analysis. Computing groups at DL and RAL are co-operating with the universities of Edinburgh and Manchester to form the UK High End Computing Consortium. This consortium has initially established a working Grid infrastructure using the Globus toolkit. This is now being exercised with a variety of applications including remote medical imaging to help surgeons during operations, combining MRI scanned data of a tooth with the results of a finite element analysis in a VR facility and real time flood warning analysis to assist emergency services.
With any new development such as the Grid, it is necessary to get many people up to speed quickly and to define best practice so experience gained is quickly shared. This project is achieving this through establishing a reference Grid implementation platform within CLRC's IT department which can then be cloned by teams in other departments developing Grid applications. The reference platform is a Linux PC running Grid middleware based initially on the Globus toolkit. The project is also operating a Grid Certification Authority, currently to the UK particle physics community. An internal Globus technical forum is held regularly to disseminate newly acquired technical knowledge.
With many scientific programmes rapidly increasing the volume of data which they generate, for example by using more complex computational models or higher resolution experimental detectors, the bandwidth of networks through which the data must be moved becomes a limitation. The development of Grid technology will increase this trend through making it easier to move data around as part of a distributed data analysis process. Within CLRC, the internal networks must carry data between experimental facilities, computers, disk and tape stores, and external locations. To meet the growing data traffic requirements, local area networks at RAL and DL are being increased to a gigabit capacity now with further increases to 10 gigabits planned within 2 years.
Petabtye Data Storage
The Atlas Data Store (ADS) at RAL provides secure and affordable long term storage for experimental and computational data from many of the scientific facilities in CLRC and elsewhere. Data curation will be increasingly important as the cost of facilities, and therefore of the data they produce, continues to rise. The future demands of the particle physics community in particular will require major upgrades to the ADS capacity over the coming years. A major increase towards petabyte capacity is currently underway with further increases planned over the next 3-5 years. As the Grid provides easier access to large scale data storage facilities, intelligent data access will become an increasingly important issue. Metadata-based data location tools are currently under development in CLRC to meet this need.
CLRC e-Science web site: http://www.e-science.clrc.ac.uk/
David Boyd - CLRC
Tel: +44 1235 446167