Generic Language Technology: Basics for Automated Software Engineering
by Mark van den Brand
Research in the Generic Language Technology project at CWI is focusing on the development of fundamental techniques for language processing. These techniques are applied in the development of high-quality tools for the analysis and transformation of large software systems written in languages including C, Java, Cobol and PL/I.
The focus of the research in the Generic Language Technology (GLT) project at CWI is on the development of fundamental techniques for analysis and transformation of (programming) languages. Besides the development of formalisms for describing the syntax and semantics of programming languages, tools for processing languages and programs are also being developed. The scope of research includes exploring new fundamental concepts, varying from incremental techniques, efficient term-rewriting engines, advanced parsing technology, new analysis techniques, and powerful code generators.
The roots of this project are in the mid-1980s. Together with INRIA (Sophia-Antipolis) and various other partners, the Esprit project 'Generation of Interactive Programming Environments' (GIPE) was started, and later continued as GIPE-II. Research focused on the development of a framework for programming environment generators, with CWI in Amsterdam looking at the development of a Meta-Environment - an environment for developing programming language descriptions, based on incremental technology for scanning, parsing, and rewriting.
Application areas for this technology include the design and implementation of domain-specific languages, software renovation, and advanced code generators. In cooperation with a Dutch software house and a Dutch bank, a domain-specific languagefor describing financial products, RISLA, was developed. RISLA and a prototype of RISLA to Cobol compiler were then implemented, and are still being used by several Dutch banks. Various projects in the field of software renovation and reverse and reengineering have been carried out over the last few years. A powerful generic parsing technology allowed us to tackle both the problem of handling various dialects of Cobol and that of embedded languages in Cobol, such as SQL, assembler, and CICS. Projects in various other industries (eg transportation, networking, telecommunication) have also been carried out.
The application area of software renovation triggered development of new scalable language-processing technology. The Meta-Environment developed within the GIPE projects was entirely redesigned and based on new modern component-based ideas and technology. The focus of our research moved from incremental techniques to scalability, flexibility, re-usability and efficiency of tools. The specific topics we are tackling in our current research are:
- exploring the benefits of integrating scannerless generalised LR parsing with advanced declarative disambiguation mechanisms
- developing new and efficient term-rewriting technology such as rewriting with annotations, compilation of rewriting rules to high-performance C-code, and the integration of rewriting technology with relation calculators
- exploring the possibilities of using rewriting technology to obtain more powerful, semantics-directed, disambiguation mechanisms.
All these technologies are immediately applied in various industrial projects.
Generic language technology research has been receiving an increasing amount of attention. There are various reasons for this. The first is that as computers become more powerful, more powerful algorithms can be applied to bigger problems, for example, in the field of software renovation. The second reason is that software in general has an ever-increasing life-cycle and it is crucial for (financial and other) industries to keep software that was written in the 60s and 70s operational. In order to analyse and transform this software automatically, advanced generic technology is needed (see Figure 1). A third reason is related to the refactoring techniques developed for Smalltalk and Java. These are currently spreading towards other languages as well, which means re-implementing these refactorings for each separate language. Expressing these in a language-independent way and applying them to programs written in other languages has proven be an interesting challenge. A fourth reason is the development of programming-environment frameworks like Eclipse (see http://www.eclipse.org). Although initially developed for Java and designed to be open for other programming languages as well, reality has proven to be more complex. More and more people are working on Eclipse plug-ins for their favourite language, whereas a more generic approach would increase the flexibility of the Eclipse framework.
|Figure 1: Software engineering tasks as document transformations.
The concepts of generic language technology are realised in the framework of the ASF+SDF Meta-Environment. This is an interactive development environment for the automatic generation of interactive systems for constructing language definitions and generating tools for them. A language definition typically includes such features as syntax, prettyprinting, typechecking, and execution of programs in the target language. The ASF+SDF Meta-Environment offers openness, reuse, extensibility, and in particular the possibility of generating complete stand-alone environments for user-defined languages. ASF+SDF allows the definition of syntactic as well as semantic aspects. It can be used for the definition of languages (for programming, writing specifications, querying databases, text processing, or other applications). In addition, it can be used for the formal specification of a wide variety of analysis and transformation problems (see Figure 2).
Figure 2: The ASF+SDF Meta-Environment as Cobol restructuring tool.
The Generic Language Technology project is currently working in close cooperation with the Protheo group at INRIA/LORIA in Nancy and with the group of Peter D. Mosses at BRICS in Aarhus. We also cooperate with Utrecht University and the Vrije Universiteit in Amsterdam. The ASF+SDF technology is used by various research groups for various purposes, including mechanical modelling (Mechanical Engineering group of the Technical University of Eindhoven, Netherlands), module composition (DIMAp of UFRN in Natal, Brasil), typesetting mathematics (LORIA, France) and algebraic specification (Univ. Bremen, Germany).
Mark van den Brand, CWI, The Netherlands
Tel: +31 20 592 4213