Cross-Language Evaluation Forum - CLEF 2005

by Carol Peters

The results of the sixth campaign of the Cross-Language Evaluation Forum were presented at a two-and-a-half day workshop held in Vienna, Austria, 21-23 September, immediately following the ninth European Conference on Digital Libraries. The workshop was attended by well over 100 researchers and system developers from academia and industry.

The main objectives of the Cross-Language Evaluation Forum (CLEF) are to stimulate the development of mono- and multilingual information retrieval systems for European languages and to contribute to the building of a research community in the multidisciplinary area of multilingual information access. These objectives are realised through the organisation of annual evaluation campaigns and workshops. Each campaign offers a series of evaluation tracks designed to test different aspects of mono- and cross-language system performance. The focus is diversified to include different kinds of text retrieval across languages and on different kinds of media (ie not just plain text but collections containing images and speech as well). In addition, attention is given to issues that regard system usability and user satisfaction with tasks to measure the effectiveness of interactive systems.

The scope of CLEF has gradually expanded over the years. In CLEF 2005 eight tracks were offered to evaluate the performance of systems for:

mono-, bi- and multilingual document retrieval on news collections (Ad-hoc)
mono- and cross-language structured scientific data (domain-specific)
interactive cross-language retrieval (iCLEF)
multiple language question answering (QA@CLEF)
cross-language retrieval on image collections (ImageCLEF)
cross-language speech retrieval (CL-SR)
multilingual web retrieval (WebCLEF)
cross-language geographic retrieval (GeoCLEF).

In order to cover all these activities, the CLEF test collection has been expanded considerably: the main multilingual comparable corpus now contains over 2 million news documents in twelve European languages- new entries this year were Hungarian and Bulgarian. Sub-collections from this corpus were used by the Ad-Hoc, QA, iCLEF and GeoCLEF tracks. The collection used to test domain-specific system performance consists of the GIRT-4 database of English and German social science documents and the Russian Social Science Corpus. ImageCLEF used a number of different collections: an archive of historic photographs provided by St Andrews University, Scotland, and several sets of medical images with French, English and German case notes and annotations made available by University Hospitals, Geneva, and by Aachen University of Technology. The cross-language speech retrieval track (CL-SR) used speech transcriptions from the Malach collection of spontaneous conversational speech derived from the Shoah archives. Finally, WebCLEF used the EuroGOV corpus, a multilingual collection of about two million webpages crawled from European governmental sites.

CLEF steering committee.

Participation in this year's campaign was considerably up with respect to the previous year with 74 groups submitting results for one or more of the different tracks as opposed to 54 groups in CLEF 2005: 43 from Europe, 19 from North America, ten from Asia and one each from South America and Australia. The introduction of the Speech, Image and Question Answering tracks in previous years and of the GeoCLEF track this year means that the growing CLEF community is increasingly multidisciplinary, with expertise in diverse areas such as natural language processing, speech recognition and analysis, geographic information systems, image processing, medical informatics, etc.

The campaign culminated in the workshop held in Vienna, Austria, 21-23 September. In addition to presentations by participants in the campaign, Noriko Kando from the National Institute of Informatics, Tokyo, gave an invited talk on the activities of the NTCIR evaluation initiative for Asian languages. Breakout sessions gave participants a chance to discuss ideas and results in detail. The final session was dedicated to proposals for activities for CLEF 2006.

The presentations given at the CLEF Workshops and detailed reports on the experiments of CLEF 2005 and previous years can be found on the CLEF website at http://www.clef-campaign.org/. The preliminary agenda for CLEF 2006 will be available from mid-November. CLEF is an activity of the DELOS Network of Excellence for Digital Libraries.

Link:
http://www.clef-campaign.org

Please contact:
Carol Peters, ISTI-CNR, Italy, CLEF Coordinator
Tel: +39 050 3152897
E-mail: carol.petersisti.cnr.it