ERCIM News No.26 - July 1996

The European Language Resources Association

by Khalid Choukri

The European Language Resources Association (ELRA) was established as a non-profit organization in Luxembourg, in February 1995. The overall goal of ELRA is to provide a centralised organization for the validation, and distribution of speech, text, and terminology resources and tools, and to promote their use within the European telematics R&TD community.

Language Resources (LRs) are universally acknowledged to be critical for the development of robust, broad-coverage, and cost-effective applications in all sectors of telematics, in particular those for written and spoken language. The cost of developing such resources is prohibitive, and due to the lack of sufficient co-ordination, existing LRs cannot be easily adapted for multiple users, thereby hindering the rapid deployment of new applications.

Market Situation

The LR area can be considered as three quite distinct fields, all of which are covered in the three 'colleges' of ELRA; terminology, written resources, and spoken resources. There is a great deal of terminology work going on in all the main languages of Europe, both at a general level and in every major sector of industrial and commercial activity. But the work is to a large extent uncoordinated and very little effort has been made to turn this work into commercial products using common standards, a situation that ELRA intends to rectify.

In the written field, the collection of corpora has become important in recent years, and is beginning to be a commercial activity; much remains to be done to organise this activity systematically and to cover all languages and user domains. The production of written lexica is very expensive and although there are many toy systems, there are few commercial activities outside those of the major publishing houses; an important source of material is the work of the established national language centres.

Spoken resources have become a fully commercial product in the last few years as the speech processing field has reached technical and commercial maturity. The major telecommunications firms have moved in and it is here that the market for ELRA's distribution activities is, in the short run, at its most mature.

In all three fields, the LR project activities of the EU Language Engineering (LE) programmes are producing new products and standards which must be a prime target for ELRA distribution activity.

To achieve its objectives ELRA has established a distribution unit (European Language resources Distribution Agency - ELDA) as the infrastructure within ELRA for identifying, collecting, classifying, validating, distributing, and exploiting LRs. ELDA manages and oversees these activities. Additional activities include developing evaluation guidelines, serving as a broker between producers and users of LRs, and functioning as a central clearinghouse for information.

ELRA appointed several panels of experts which will advise the ELRA Board in crucial aspects of its activities. The initial panels appointed by the board are: Each panel consists of a core of ELRA members, selected to represent the expertise of the 3 colleges (speech, written, terminology) and chaired by a convenor.

ELRA/ELDA has started addressing the fundamental organisational, technical, and economic problems which constitute the crucial barriers to the development of the market of LRs. For this purpose, ELRA is now working in order to:
The services provided by ELRA could vary from the simple cataloguing and propagating of information, to promotion and brokerage, through assistance to the producers in preparing their LRs for documentation, validation and normalization of LRs, including their physical distribution.


Because the field is relatively immature, one of the first priorities is to establish standards to facilitate reuse for performance and interworking and standards for quality control of the resources. The project of the Expert Advisory Group on Language Engineering Standards (EAGLES) and other LE projects (SPEECHDAT, PAROLE, INTERVAL) will be used as the basis for this work in establishing the standards, but the role of ELRA is to ensure that the standards are applied, not least in quality control of resources.

Ownership rights are also a major problem in the field, with the associated problems of copyright and copying prevention. The project will analyse various possible solutions, suggest codes of conduct, stipulate contracts which regulate the status of LRs distributed.

Results of the work of the association can be measured by the number of members, by the number of LRs handled, and by the number and value of the LRs collected, validated, and disseminated. In a more qualitative, but perhaps in the long run more important sense, the success of ELRA will be judged by how it succeeds in raising the profile of LRs and LE throughout the EU. Results will also come from the stimulation it provides to the creation of LRs, and in particular in those fields where some social or other non-commercial incentive is provided for the creation and dissemination of LRs.

Please contact:
Khalid Choukri - ELRA
Tel: +33 1 45 86 53 00

