Generating Program Generators

PROGRAMMING LANGUAGE TECHNOLOGIES
ERCIM News No.36 - January 1999

Generating Program Generators

by Arne J. Glenstrup, Henning Makholm and Jens Peter Secher

Research in semantics based program manipulation has been carried out in Copenhagen for more than a decade. The DART project (a member of ERCIM partner DANIT) is putting theory into practice by deriving program manipulation tools from programming language theory. An important instance is that program generators (and generator generators, etc.), can be produced automatically and correctly from executable problem specifications.

One of the main goals is to produce tools that are automatic in the sense that once instantiated by a user, they can be repeatedly applied to widely varying problem instances without further user intervention. This semantic approach is applicable to a wide range of areas:

program transformation can be safely performed, eg for instrumentation, simplification, or translation
optimisation by specialisation of generic programs to specific tasks can significantly reduce computational overhead
generation of compilers from interpreters can be achieved automatically.

Partial Evaluation
Partial evaluation is an automated source-to-source transformation technique that can be used to optimise generic software: a program together with some known portion of its input, is transformed into a specialised version obtained by pre-computing the parts of the program that only depend on this input.

Usually when writing programs there is a tradeoff between, on the one hand, generic and easily maintainable programs and, on the other hand, fast programs. Partial evaluation bridges this gap, letting you have your cake and eat it too: generic programs are (automatically!) turned into fast programs for specific problem instances.

In a software development setting, a generic, well-tested module can be taken off the shelf and then specialised automatically to a specific usage pattern, thereby speeding it up. Since the specialisation preserves the semantics of the module, it is possible to produce both reliable and efficient software.

Specialisation can be done quickly, and this means that rapid prototyping and production can be unified. As a side-effect, some of the optimisation usually done by hand can be done in an automatic, application-independent way. Furthermore, the original program can be left untouched, so its structure and readability can be retained.

Theoretical foundation: the Futamura Projections describe the generation of compilers from interpreters, avoiding the error-prone task of writing a compiler by hand. DART researchers were the first in the world to transform these theoretical ideas into practice by producing an automatic compiler generator using these principles. Compiler generation has continued to be a focus point of DART work in partial evaluation.
Partial Evaluators
The most sophisticated partial evaluators today work with semantically well-defined languages like Scheme and SML (an example is Similix; see web address below). The languages that are more widely used in the software industry tend to be less rigorously defined, but nevertheless the understanding gained by semantics-based techniques have proved valuable in the development of practically oriented partial evaluators like DART’s C-Mix for the C language; FSPEC for Fortran; and a partial evaluator for Java that is under way in the project.
There is an interaction and exchange of ideas with INRIA. In Rennes, the COMPOSE group develops the Tempo Specialiser for C which complements C-Mix by focusing on a very fast specialisation process that makes it possible to produce specialised object code at run time.

Successful experiments have been carried out in a number of settings, including ray tracing, model simulations, implementations of domain-specific languages, numerical algorithms, and compiling by specialising interpreters.
C-Mix: Partial Evaluation of C
C-Mix is an automatic partial evaluator that specialises C programs that conform to the ISO C standard. It takes a generic program together with a classification of the input and produces a specialised-program generator. When this generator is provided the portion of the input that is known in advance, it will produce a specialised program, as depicted in the figure. When the specialised program is run on the remaining input it produces the same output as the original program, only faster. Often, the same specialised program is reused, resulting in significant speedups in the running times.

The specialised-program generator is a stand-alone program that can be used without any knowledge about partial evaluation.

C-Mix does a number of complex analyses on the subject program - eg, alias, liveness and binding-time analysis - to calculate which constructs depend on known input only. Using this information, the specialised-program generator can be produced.

C-Mix is available free of charge, and is implemented with standard tools (GNU C/C++ compiler and tools), which means that it is highly portable (it runs on Unix-like systems, Windows, DOS, etc.). It has an HTML-based inspection toolkit that provides the developer with the results of the analyses, enabling fine-tuning of the specialisation process. URL http://www.diku.dk/topps/Research.html contains links to C-mix, Similix, and various other DART activities.

The COMPOSE group in Rennes also work on program specialisation and provide a C program specialiser, see http://www.irisa.fr/compose/.

Please contact:

Jens Peter Secher - DART
Tel: +45 35 32 14 08
E-mail: jpsecher@diku.dk

return to the ERCIM News 36 contents page