ERCIM News No.45 - April 2001 [contents]
by Brian Coghlan and Michael Manzke
In the future, as many people have said, computing will become a ubiquitous strategic resource, which users may avail of any time, anywhere. The subject is in its very earliest stage at present, with a very great number of basic research and practical issues to be investigated. Initial seed funding has been provided by Enterprise Ireland to guarantee the establishment of a working grid between compute clusters at three partner sites in Ireland, that is, the Departments of Computer Science in Trinity College Dublin, University College Cork and NUI Galway.
This project has one primary objective: to establish grid-ireland. This requires:
- Hardware: Compaq will donate a 4-way symmetric multiprocessing gateway machine per site
- Software: initially the Globus services
- Management: the Dept. of Computer Science at Trinity College will do this
- Interconnect: the Irish academic network services will be used.
A secondary objective is to begin co-operative research work between the three sites. The themes are clear:
Understanding: the most basic objective is the understanding of system state and estimation of future state of individual nodes, complete clusters, and heterogeneous compositions of these to form a computational grid. This will be explored using control theory via the analagous problem of real-time grid input-output, which is a very difficult concept due to the non-determinism of the grid.
Evolution: traditional models of grid computation (control-driven models) encourage programming practises that are polarised between the shared memory model common to symmetric multiprocessors and the message passing model common to ensembles of independent nodes. There is, however, a middle ground where lateral thinking and systematic exploitation may bear fruit; this hybrid model is essentially evolutionary.
Revolution: in contrast, data-flow and demand-driven execution models take a revolutionary approach, viewing an execution as a set of computational nodes, connected by paths along which data values move in a manner determined by the topology of the graph and by the rules underlying that model. Their superset, condensed graphs, offers a generalisation of both data-flow and demand-driven configurations.
Algorithms: the grid is different, ergo the algorithms must be different? We will explore suitable applications algorithms for the grid, particularly the emerging techniques for adaptive grid simulations. The programme will involve both the development of parallel Monte-Carlo algorithms and testing of these in a clinical environment both locally and nationally.
The grid is very interesting in terms of behaviour because it is extreme - tightly coupled processors will be interconnected via a high latency low bandwidth network. Moreover it is diverse. A great deal of basic research is needed into the conceptual models for almost every aspect of the grid environment, and these efforts will need to be driven with hard applications in diverse fields, more than is typical at present.
Figure 1: 16-node cluster at Trinity College Dublin. Figure 2: left: tracer/analyzer for the Scalable Coherent Interconnect (see http://www.cs.tcd.ie/coghlan/scieuro/), and right: in a heavily instrumented environment.
A major aim of those involved in grid research is to enable applications to utilise resources in an optimal way, by recognizing current utilisation of the grid, by exploiting those resources that are under-utilised, and by avoiding or moving away from those that are over-utilised. This requires monitoring, analysis, and control mechanisms. Here we fail at the first hurdle - even current monitoring mechanisms are extremely minimal.
Moreover, there is little or no understanding of how the grid behaves. The computational grid is recognized to be a very dynamic system. Monitoring and analysis must be conducted on-line - offline approaches are not viable. Furthermore, attempting to control a poorly understood system is a recipe for catastrophe. There have been a number of famous failures of the power grid.
The most essential thing is to understand the dynamics of the grid, with all its non-determinism. Here we propose to investigate concepts from process control theory. This very fledgling work is being conducted in the context of research activities related to the global state estimation and optimisation of compute-clusters. Measurement is crucial. This is a subject in which Trinity College Dublin has long had an interest, particularly at the instrument and analysis level.
For traditional (control-driven) execution models, maximally-optimized message-passing provides the upper bound for efficiency. The optimizations may be complex and counter-intuitive - maximally optimizing a message-passing program requires significantly greater effort and expertise than for shared memory. The reasons for this are clear: message passing requires explicit data placement and communication, whereas for shared memory this is done implicitly. The grid will exacerbate this situation. The motivation for shared memory is to eliminate the need to expend so much effort.
From the programmers point of view, it is easier if at all times all processes have a consistent view of the contents of shared data structures. However, parallel programs require synchronisation for correct (race-free) execution. Many shared memory optimizations exploit this fact and weaken consistency. Generally the weaker the consistency, the less the message traffic, and on the grid this will assume greater importance. The unfortunate reality is that, despite this, message-passing is the most efficient solution, at least in critical places in a program. In an attempt to overcome this, we propose to explore the realm between these extremes. There are a number of very interesting possibilities. This is very speculative research.
Brian Coghlan - Trinity College Dublin
Tel: +353 1 6081 766
Michael Manzke - Trinity College Dublin
Tel: +353 1 6081 797