Parallel Model-Checking

by Lubos Brim

With the rapid increase in computer system complexity, it has become very important to develop formal methods to ensure their quality. Several novel parallel and distributed techniques for enumerative model-checking of safety and liveness properties expressed in a simple temporal logic have been developed in the Parallel and Distributed Systems laboratory at the Masaryk University Brno under the project 'Automated Verification of Parallel and Distributed Systems'.

Early detection of errors requires application of advanced analysis, verification and validation techniques to test modelling resources, temporal properties, data-type invariants, and security properties. Various techniques for automated and semi-automated analysis and verification of computer systems have been proposed. In particular, model checking has become a very practical technique due to its push-button character. The basic principle behind model checking is to build a finite model of the system under consideration together with a formal description of the verified property in a suitable temporal logic. The model-checking algorithm is a decision procedure, which in addition to the yes/no answer returns a trace of a faulty behaviour in cases where the checked property is not satisfied by the model. One of the additional advantages of this approach is that verification can be performed against partial specifications, by considering only a subset of all specification requirements. This allows for increased efficiency by checking correctness with respect to only the most relevant requirements.

Although model checking has been applied fairly successfully for verification of quite a few real-life systems, its applicability to a wider class of practical systems has been hampered by the so-called state explosion problem (ie the enormous increase in the size of the state space). For large industrial models, the state space does not completely fit into the main memory of a computer. Consequently, when the memory becomes exhausted and the system starts swapping, the model-checking algorithm becomes very slow.

A typical approach to dealing with these practical limitations is to increase the computational power (especially random-access memory) by building a powerful parallel computer as a network (cluster) of workstations. Individual workstations communicate through a message-passing interface such as MPI. From outside, a cluster appears as a single parallel computer with high computing power and a huge amount of memory. In recent years a lot of effort has been put into using parallel and distributed environments to solve the computational and space complexity bottlenecks in model-checking.

The cluster PC SGI 1200 is being used for experimental evaluation of the parallel algorithms.

The main question is whether we can (re)design verification techniques in such a way that they could be implemented on parallel computer architectures. In other words, we would like to find techniques for decomposing complex verification problems into 'smaller' independent sub-tasks, which could then be either further decomposed or directly solved. Efficient parallel solution of many problems often requires the invention of original, novel approaches that are radically different from those used to solve the same problems sequentially. Several methods for parallel model-checking have been accepted and implemented in industrial tools. Performance results on either parallel machines or on a cluster of workstations show significant improvements over sequential techniques, both in extension of the size of the problem and in computational times, along with adequate scalability with the number of processors.

We have developed several novel parallel and distributed techniques for enumerative model-checking of safety and liveness properties expressed in a Linear Temporal Logic (LTL). The model-checking problem being considered can be reformulated as a cycle detection problem in an oriented graph, and the basic principles behind our algorithms rely on efficient solutions to detecting cycles in a distributed environment. In particular, we employ specific structural properties of the underlying graphs (often computable in advance from the given system specification), use additional data structures to divide the problem into independent sub-problems, or translate the model-checking problem into a different problem that admits efficient parallel solution. Several of our algorithms are based on sequentially less efficient but parallelisable breadth-first exploration of the graph or on placing bounds that limit the size of the graph to be explored.

Development of a tool that supports the distributed verification of systems is one of our recent projects. The goal is to build an environment for easy implementation of parallel and distributed verification algorithms on clusters of workstations, followed by experimental evaluations and comparisons. The main characteristics are the support for the distributed generation of the state space, dynamic load balancing, distributed generation of counter-examples, fault-tolerance and re-partitioning. The distributed environment quite naturally allows for the integration and cooperation of methods and algorithms as well.

Other research performed in the laboratory is focussed on the development of original methods and techniques for the automated verification of large-scale industrial critical systems, with an emphasis on the practical aspects of their application. Further, it looks at applying these and other known methods and techniques to real-life systems, optimising them to improve their efficiency, and providing software support to use them.

Link:
http://www.fi.muni.cz/paradise

Please contact:
Lubos Brim, Masaryk University Brno/
CRCIM, Czech Republic
Tel: +420 540 459 647
E-mail: brimfi.muni.cz