GridPP: A UK Computing Grid for Particle Physics
by Sarah Pearce
UK particle physicists have built a functioning prototype Grid, to analyse the data deluge from CERNs next particle accelerator. Over the next three years, this will be scaled up and integrated further with other Grids worldwide, to produce the worlds first persistent, international, production Grid.
In 2007, CERN will introduce its Large Hadron Collider (LHC) the worlds largest particle accelerator. LHC will allow scientists to penetrate further into the structure of matter and recreate the conditions prevailing in the early universe, just after the Big Bang. But the four experiments at the LHC will produce more data than any previous coordinated human endeavour 10 Petabytes each year, equivalent to a stack of CDs twice the height of Mount Everest. Careful analysis of all of this complex data will be required to look for some of the most elusive constituents of current physics, such as the Higgs particle and supersymmetry.
|In the 1000 million short lived particles of matter and antimatter are studied each year in the LHCb particle physics experiment. In order to design the detector and to understand the physics, many millions of simulated events also have to be produced.
To deal with this data, LHC will use distributed computing. More than 100,000 PCs, spread at one hundred institutions across the world, will allow scientists from different countries to access the data, analyse it and work together in international collaborations. Even today, the LHC Computing Grid is the largest functioning Grid in the world, with over 5,000 CPUs and almost 4,000 TB of storage at more than 70 sites around the world. With up to 5,000 jobs being run on the LHC Computing Grid (LCG) simultaneously, it is becoming a true production Grid.
A Particle Physics Grid for the UK
GridPP is the UKs contribution to analysing this data deluge. It is a collaboration of around 100 researchers in 19 UK University particle physics groups, CCLRC and CERN. The six-year, £33m project, funded by the UK Particle Physics and Astronomy Research Council (PPARC), began in 2001 and has been working in three main areas:
- developing applications that will allow particle physicists to submit their data to the Grid for analysis
- writing middleware, which will manage the distribution of computing jobs around the grid and deal with issues such as security
- deploying computing infrastructure at sites across the UK, to build a prototype Grid.
The UK GridPP testbed currently provides over 1,000 CPUs and 1,000 TB of storage to LCG, from 12 sites in the UK. It is linked to other prototype Grids worldwide, and has been tested by analysing data from US particle physics experiments in which the UK is involved. Several other smaller experiments have also started to use the prototype Grid, and particle physicists are using it to run data challenges, that simulate the data analysis needed when LHC is up and running. In this way, UK particle physics has progressed from Web to Grid.
A Tiered Structure
This GridPP testbed is being developed on a hierarchical model, reflecting the overall structure of the wider LCG testbed. CERN provides the 'Tier-0' centre, where the LHC data will be produced. GridPP has contributed £5m to the CERN for this, which has been used to support staff and buy hardware. The UKs 'Tier-1' centre at Rutherford Appleton Laboratory focuses on data storage and access. In addition there are four smaller, regional, 'Tier-2s' in the UK, with a focus on provision of computing power for generating simulated Monte Carlo data and for analysis of data by individual physicists. In addition, the Grid Operations Centre (GOC), based at RAL, monitors the operational status of resources deployed internationally through LCG and in the UK through GridPP.
GridPP2 The Next Phase
The second phase of the GridPP project began on 1 September 2004. In the lead up to 2007, this will extend the UK particle physics grid to the equivalent of 10,000 PCs. The infrastructure in the UK will be continually tested, both by current experiments and by the LHC data challenges to ensure that the final system is ready by 2007. By the end of this second phase of GridPP, UK physicists will be analysing real data from the LHC, using the UK Grid for particle physics.
As well as working with other international particle physics experiments, GridPP is playing a leading role in European Grid projects. During its first three years, GridPP personnel were integral to the EU-funded DataGrid project, which brought together scientists from Earth observation, bio-medicine and particle physics to create prototype a European-wide Grid. By the time of its final review in March 2004, EU DataGrid had produced around a million lines of code and had a testbed of 1,000 CPUs, which had run more than 60,000 jobs.
GridPP is now involved in the follow-on EGEE project (Enabling Grids for E-science in Europe), which aims to support the European Research Area by bringing together Grids from different countries and different disciplines.
Working beyond Particle Physics
Within the UK, GridPP2 is also collaborating with other parts of the UKs e-science programme, such as the National Grid Service. Many of the tools developed by GridPP could be useful for other disciplines for example, GridPP is working with clinical researchers on the potential for using its computer security tools in the health service. In addition, GridPP members are collaborating with industry, sharing experience of current Grid development issues and solutions adopted.
The LCG project: http://lcg.web.cern.ch/LCG/
PPARC e-Science: http://www.pparc.ac.uk/Rs/Fs/Es/intro.asp
Sarah Pearce, GridPP Dissemination Officer
Queen Mary, University of London
Tel: +44 20 7882 5049