Knowledge-based Production of Synthetic Multimodal Documents

by Melina Alexa, John Bateman, Renate Henschel, and Elke Teich

Current attempts to solve the problem of information overload that increasingly faces users of information systems fail to provide an adequate foundation for information presentation, appropriate to users' information needs. Extending the multi-media capabilities of information systems as well as the breadth of information theoretically available leaves this fundamental issue unaddressed. The use of linguistically motivated principles for the automatic production and presentation of infor-mation is now enabling the construction of a new generation of generic techniques for information access.

Complex and plentiful information requires powerful techniques for its presentation. Some of the most powerful, flexible and user-friendly techniques known are those of natural language. This includes not only the side of language that all can see and hear ­p; text and spoken ­p; but also the underlying coherent structures that motivate the hanging together of bodies of texts (including hypertexts) and of interactions.

The multilingual language generation department KOMET of the 'Publishing Division' of the GMD-Institute for Integrated Publication and Information Systems is exploring the use of language technology for the automatic generation of synthetic documents as an interface metaphor for interaction with information systems. Based on solid experiences in the context of inter-departmental collaborations within the 'Publishing Division' of the GMD-Institute concerning the construction of large-scale information systems, we see the following as two of the most important foundations for future information systems design: Linguistically motivated principles enable an information system to design its information presentation very flexibly. Information is made available to a user through several filtering and structuring stages. The first filter is provided by direct manipulation or information retrieval concerning the information base itself. This permits a user to focus on particular objects or sets of objects maintained in the knowledge base. The remaining filters are all under direct control of the information presentation system and include: selection of further supportive information from the knowledge base depending on desired text type and user knowledge, the organization of that knowledge into a rhetorically motivated structure, the division of this structure into substructures, and the presentation of these substructures to the user as interrelated texts. Work in progress is being carried out in close cooperation with the automatic visualization and automatic page layout groups of the 'Publishing Division'. Some examples of current presentation capabilities which demonstrate the results of the communicatively effective combination of graphical and text modes on the basis of our present prototype can be seen on the web at: These examples show screen dumps from a prototype artist biography 'information provider' capable of presenting information drawn from an object network containing over half-a-million objects.

The supporting technology under development by KOMET is based on a highly stratified, communicative-functional model of the linguistic system that provides generic techniques for the linguistic expression of information, for accessing domain knowledge, knowledge bases, etc., and for organizing the presentation of that information rhetorically. Interfacing with domain knowledge is handled by means of an extensive linguistically and multilingually motivated ontology called the Generalized Upper Model. The generation technology as a whole is furthermore text type based, enabling rapid development to new text types as required by different applications.

A uniform representational medium (systemic networks as defined by systemic-functional linguistics) is used throughout the different levels of information maintained. There are also export capabilities to several state of the art computational typed feature formalisms, which the KOMET system itself does not use for reasons of efficiency. Example pages such as those that can be seen at the web address above are produced in real-time.

The current system also includes an extensively multilingual development environment for the construction and maintenance of large-scale linguistic resources; this environment, freely available to the research community, radically speeds up new language resource development. The KOMET text generation system now produces short texts in English, German, and Dutch. Other language resources are currently under development (usually in international cooperations) for French, Italian and Greek, among others. The basic tactical generator is being used in a number of international generation projects for several of these languages. The generation capabilities of KOMET have also been recently extended to include spoken language generation (currently for German).

Due to the crucially empirically based development of linguistic resources at all levels (eg, large-scale grammatical resources that cover the constructions that are actually necessary for a given text type, broad characterizations of text types that are driven by analyses of existing text type phenomena, etc.), particularly strong tools have also been created for supporting semi-automatic text analysis. These are now being used both for empirical text analysis and domain knowledge acquisition.

Current directions of extension include the further division of rhetorically motivated structures into interlinked substructures that can form the basis for hypernodes in extended synthetic hyperdocuments. Hyperlinks are guaranteed to be intelligible and coherent by virtue of the supporting rhetorical organization. Hypernodes may themselves contain significant substructure which is presented to the user as a page of information reminiscent of a magazine or encyclopaedia: ie, combining different texts, graphics, pictures, etc. There are many synergies to be observed between the general problems of text generation, graphics generation, and layout; shared processing techniques are therefore also being investigated across the different modes of information presentation. More information on KOMET is available on the web at:

Please contact:
Melina Alexa - GMD
Tel: +49 6151 869 809

