Automated Support for Agile Software Reuse
by Mel Ó Cinnéide, Nicholas Kushmerick and Tony Veale
Agile software development methodologies tend towards sparse up-front design and minimal commenting of source code. Researchers in the Department of Computer Science, University College Dublin are investigating ways of providing automated support for software reuse in this code-centric context.
In recent years there has been a trend towards lightweight, flexible software development methodologies, collectively known as Agile Processes. In these approaches the focus is on the source code produced rather than design documentation or methodological considerations. This presents a new challenge to the age-old problem of software reuse, specifically the task of recommending software components. Approaches that involve design-level reuse (eg, based on UML descriptions) are no longer valid as this design documentation is typically not produced. Approaches that rely on the programmer adding extra information to the code are also bound to fail in a context where minimal commenting is the norm.
One of the novel approaches taken by UCD researchers has been motivated by the existence of large open-source repositories such as SourceForge. Consider a programmer building an application using a framework such as Swing. SourceForge contains a lot of information on how best to develop Swing applications, but it is expressed as raw source code and it is hard to determine which particular components are of benefit in any particular programming effort. We have applied Collaborative Filtering (CF) to the problem. CF is based on the premise that a group of users who share preferences for certain items are likely also to agree on future items. In this context, the user is the class the programmer is currently developing and the items in question are existing components such as classes and methods.
During the development of the software, we use the partial class written by the programmer so far in order to find other similar classes. The classes and methods used by these similar classes are then suggested to the programmer as suitable candidates for reuse. This approach initially seems naïve, but a study performed on the Swing classes in SourceForge proved extremely promising. For example, when a class was only half specified by a programmer, our approach can predict with over 50% accuracy the remaining components the programmer will need to use in order to complete the class.
While Agile Processes reduce the potential for reuse as outlined above, they provide added possibilities as well. Programmers are encouraged to reduce commenting but to improve the comprehensibility of the code itself. Because of this, names of classes, methods and fields tend to be longer and more expressive. This has led UCD researchers to start examining the meanings of these names in an effort to improve reuse possibilities.
WordNet is an on-line ontology of the English language that organises the lexicon into sets of synonyms and defines various relationships between these synonym sets, including inheritance and composition relationships. This enables components to be suggested to a programmer based on the names already used in the program. As a simple example, if a programmer introduces a class named Student, the system can suggest Person as a suitable superclass, based on the fact that Person is a superclass of Student in the WordNet ontology. If the programmer subsequently introduces a Lecturer class, the system can predict that a university application is being developed and provide either a nascent set of stub classes that model a universitys structure, or suggest classes to reuse from a repository. This approach to software reuse is predicated on user-defined names in current software being English words. Our study of one large application from SourceForge demonstrated that over 60% of class and method names are fully defined in WordNet and a further 25% are partially defined. This lends strong credence to our notion of lexically driven software component retrieval.
The goal of both research strands is similar: to propose components for reuse based on the code written by the developer so far, without requiring any further input from the programmer and making only minimal and realistic assumptions about a repository of existing code. Even in this constrained set of circumstances, we have demonstrated the potential for automated support for software reuse. We are currently experimenting further with these ideas and plan to develop a prototype tool in the form of an Eclipse plug-in.
Mel Ó Cinnéide
University College Dublin, Ireland