ERCIM News No.46, July 2001 [contents]
by Fredrik Olsson, Preben Hansen, Kristofer Franzen and Jussi Karlgren
In order to develop systems and methods for improved information access for different users in different situations, a new research theme at SICS focuses on combining knowledge of human behaviour, language technology and linguistic principles.
Texts are complex and underspecified structures that leave room for interpretation on the part of the reader. This may seem undesirable, and it is arguably more efficient in terms of information transmission to structure information properly at the time of production, eg by employing controlled language strategies or database solutions of various kinds. This approach not only removes some of the major drawbacks of text but also removes some of the major assets of text as an information transmission and storage device. It is all but impossible to predict what information needs and structures the text will eventually be used to fulfil; it is presumptious to impose the authors structures on all future readers; the portability of the information from one informational context to another will suffer. Text is crucially vague and indeterminate, and it is worth keeping that way.
Instead, we need systems that deliver tailored information just-in-time. Our research theme puts great emphasis on tailoring information access, which in turn necessitates understanding information needs.
Systems for accessing information employ a rather simplistic attitude towards notions such as text, user, information need, seeking process and relevance. Most of the research in the field today concentrates on tuning algorithms to standard problems. But each one of the above central concepts deserves more attention. Our research theme works on methods for the next generation of information access systemswhich we believe will be less general and more crucially bound to usage and domain.
Information Access and RefinementSICS Approach
Information access and refinement is a new research theme at the Swedish Institute of Computer Science (SICS). Information refinement is a growing field, especially as regards information extraction and summarisation technology. Likewise, the study of interaction and usage of information access systems has gained great research interest. There is some research done on finding supra-sentential structure in text. What makes our research theme unique is that we consider both the beginning and the end of the chain, and that we strive for a model that is primarily verifiable through study of human behaviour and based on language technology and linguistic principles. Engineering and system construction aspects are in second place. This is our way of avoiding local minima: linguistics may appear impractical and behavioural science may appear detached. But the conjunction of the two will, in the long run, provide us with solutions to practical problems.
Central Concepts to the Theme
Information access is about providing people with different tools and methods for granting reliable and simple access to the information they need, ideally with awareness of task and context of the access situation. What ties together the notions of user, information need, and contents of a text is relevance. A text can be said to be relevant if it satisfies some users information need in some situation.
By information refinement we mean the process in which a text is further processed to find and compile the pieces of content that are relevant from a certain perspective. Example techniques are information extraction, information retrieval, automatic summarisation, and generation of reports, all of which are techniques which consider the contents of a text from the point of view of a user with a particular information need.
Steps to be taken
The commercial systems of today assume that texts are simply bags of words, that all users are alike, that the need for information is static, and that dialogues in information seeking systems are simplistic one-shot transactions where the user is happy to exchange a few words for several thousands of possibly relevant links. Commercial systems also assume that relevance is not depending on users, situations, perspective, temporal aspects or other extratextual factors. This is a useful starting point from which to consider improvements.
To be able to refine information and to improve todays systems, we need to consider at least three things:
It is only by taking into consideration these three items simultaneously and integrating parts of them that we will be able to come up with something interesting.
We are currently involved in several projects, national as well as international, in which the intersection of our ideas presented in this text is tested. We aim at developing an open architecture for information refinement as well as defining methods for identifying ways of improving information access to fit the needs of different users with specialised needs of information expressed in different situations.
Fredrik Olsson SICS
Tel: +46 8 633 15 32