ERCIM News No.45 - April 2001 [contents]
DIET: A Distributed Interactive Engineering Toolbox for Client-Server Applications in a Grid Environment
by Frédéric Desprez
Huge problems can now be computed over the Internet thanks to Grid Computing Environments like Globus or Legion. Because most of current applications are numerical, the use of libraries like BLAS, LAPACK, ScaLAPACK or PETSc is mandatory. The integration of such libraries in high level applications using languages like Fortran or C is far from being easy. Moreover, the computational power and memory needs of such applications may of course not be available on every workstation. Thus, the RPC seems to be a good candidate to build Problem Solving Environments on the Grid. Several tools following this approach exist, like Netsolve, NINF, NEOS, or RCS.
In 1998, we have started a project for the parallelization of a Matlab-like environment called Scilab which developed at INRIA. One of the chosen approach was to link the Scilab tool to Netsolve.
Then, in 2000, we started the DIET project (Distributed Interactive Engineering Toolbox), for the development of a hierarchical set of components to build Network Enabled Server (NES) applications. Our target platform is the fast network VTHD connecting several research centers (and their clusters) from INRIA.
This project involves several research teams in CS laboratories accross France: ReMaP at LIP (Lyon), Résédas at LORIA (Nancy), and SDRP at LIFC (Besançon).
Usually, NES environments have five different components: CLIENTS that submit problems to SERVERS which solve them, a DATABASE that contains information about software and hardware resources, MONITORS that get informations about the status of the computational resources, and finally a SCHEDULER that chooses an appropriate server depending of the problem sent and information contained in the database.
In DIET, a server is built upon Computational Resources Daemons and a Server Daemon. We have a hierarchical set of agents including Leader Agents and Master Agents. A redirector is used to choose a master agent which is close to the client. Requests for computation from a client are sent to the nearest agent. We believe that such a hierarchy is mandatory when building scalable environments for the Grid. We have designed several tools for resource discovery and monitoring:
- SLiMs (Scientific Libraries Metaserver) goal is to make the junction between problems submitted by clients and the implementations available on servers. In most case, there is no one-to-one mapping: a single problem can be solved by many implementations from several libraries, while another problem may need more than one computational step to be solved. All needed information are stored in a LDAP tree. LDAP is a distributed database protocol which was chosen for its read and search optimizations.
- FAST (Fast Agent System Timers) is a tool for dynamic performance forecasting in a Grid environment. FAST is composed of several layers and relies on low level software. First, it uses a network and CPU monitoring software to handle dynamically changing resources, like workload or bandwidth. FAST uses and enhances the Network Weather Service (NWS), a distributed system that periodically monitors and dynamically forecasts the performance of various network and computational resources. FAST also includes routines to model the time and space needs for each triplet (problem; machine; parameters set). They are based on benchmarking at installation time on each machine for a representative set of parameters and polynomial data fitting. To store these static data, FAST uses the same LDAP-tree as SLiM.
In order to implement a network enabled problem solver, one can choose between many communication layers. Low level layers like the socket interface eventually allow the best performance. Higher level layers such has ones complying with the CORBA norm although provide interfaces for a easier and quicker development.
The great variety of metacomputer components (computer, networks, files, software services) and their dynamical behavior raise special problems from the resource management point of view. In such architecture, management will play a major role and it appears very important to be able to offer a unified framework for the management of network, services and applications. The framework implies the definition of an architecture that guaranties the interoperability between applications through the portability of management information wherever they are originated.
We claim that middleware layers based on Java technologies, and more especially JMX (Java Management eXtensions), offer new opportunities to support NES applications in a Grid environment. Our approach is based on WEBM (Web-Based Enterprise Management) which the standardization effort is relayed by the DMTF (Distributed Management Task Force).
Conclusion and Future Work
Our future work is to first test this approach on real applications. As our target platform allows 2.5 Gb/s communications between several INRIA research centers, connecting several clusters of PCs and parallel machines, we think that tightly coupled applications written in a RPC mode could benefit of such an approach. Another problem we would like to address is the optimization of data distributions for parallel library calls using a mixed data and task parallel approach. We also would like to connect our developments to infrastructure toolkits like Globus to benefit from the development of security, accounting, and interoperability services.
A concerted effort is on his way to define the overall architecture and basic functionalities of Grid environments. A working group is dedicated to Advanced Programming Models which include, of course, Network Enabled Solvers. We think that this effort is very important to be able to get efficient software infrastructure soon and we would like to be part of it.
We are of course open to collaborations with other ERCIM members.
DIET project: http://www.ens-lyon.fr/~desprez/DIET/index.htm
Frédéric Desprez - École normale supérieure de Lyon
Tel: +33 4 72 72 85 69