CLEF: Joining up Healthcare and Biomedical Research

by Aniko Zagon, Adel Taweel and Alan Rector

CLEF (Clinical e-science framework) is a Medical Research Council sponsored project in the UK's e-science programme. CLEF is in its third year and is about to enter its second phase of development when it will begin translation of research into applications. CLEF aims to establish methodologies and a technical infrastructure for the next generation of integrated clinical and bioscience research.

The context for CLEF's work is provided by the current worldwide trend to create a single electronic healthcare record for each patient, eventually enabling a coherent medical care. This vision is currently being translated into reality in the UK in the government's 10-year close to £ 9 billion pounds National Programme for IT (NPfIT).

Information on the long-term course of patients' illnesses and treatments is needed both to improve clinical care and to enable post genomic research. CLEF's task is to develop a safe, generic, high quality, interoperable and pseudonymised information repository, derived from Electronic Healthcare Record (EHR) systems that can be queried just like any other GRID resource by healthcare and bioscience researchers without endangering operational security or data ethics.

Barriers to Improved Clinical Information
CLEF is developing methods for managing and using pseudonymised repositories of the long-term patient histories which can be linked to genetic genomic information or used to support customised patient care. It will enable ethical and user-friendly access to the information in support of clinical care and biomedical research.

CLEF's primary contribution will be removing key barriers to managing healthcare data repositories:

Compliance with privacy, consent, and security issues at any time and at all levels; policy, organisational structure, and technical implementation.
Comprehensive and intelligent Information capture. Most of the clinical information is in the form of free text and not structured and systematic records. To address this problem, CLEF needs to design a 'context library' for data, which can be safely derived from existing notes to help to improve data utilization for researchers.
Information integration. In their raw form, clinical records consist of hundreds of test results, medication and appointment notes. To make this data useful for clinical and bioscience researchers, a coherent 'chronicle' of events must be inferred from the records that summarises the key events from a single patients treatment records. This is a complex problem when the data concerned are clinical treatment records that may considerably vary from one patient to another...
Analysis and presentation of information for clinical and other scientific researchers. The CLEF repository will be used by clinicians, medical researchers and a variety of other scientists, who will not be IT specialists.The questions that will put to the CLEF repository will be in a range of contexts and may require information from many sources from the GRID framework. The questions to be asked hence are difficult to preempt and creation of a query that will work to expectation can only be developed in collaboration with specialist groups of user beta testers.
Knowledge resources. All of the above tasks require contextual data mining and recognition of implied meanings of the information. Since the number of knowledge resources is mushrooming, their coordination and management at the level of information integration is another complex task.
Standards. Cooperation requires standards, which are only just emerging. Coordination of information gathering is also a serious security and ethical issue when medical data are involved and requires that data protection and system management guidelines are not only established but are also effectively enforced. Contribution in this international work is an integral part of CLEF.

Technologies in CLEF Solutions
Information Extraction from multiple texts. A typical lifetime record consist of 100- 200 text documents and even more laboratory, pharmacy or other structured data items. To improve the precision of information extraction, all available documents are used and cross-referenced during extraction.

At present, CLEF is using records of deceased patients where data confidentiality issues do not apply but the quality and complexity of the data are the same as in any other medical records and hence enable technological and regulatory/standardization processes to develop in parallel.

Information integration into standard healthcare record formats. CLEF draws on the work of OpenEHR which uses the new CEN standard for information interchange. CLEF's repository is built from standard 'archetypes' (reusable elements which facilitate interoperability and which can evolve) that were also adopted by HL7, the major standardization body in healthcare informatics.

'Chronicalisation'. The CLEF chronicle is an attempt to form a coherent view of the best inference about the course and choice of treatments in any single patient. Creation of a chronicle from 'index events' and their occurrences are extremely important source of information for those who study disease and treatment ontologies for research purposes. This is an extremely difficult transformation and thus the 'chronicle' will come into focus gradually as understanding improves.

Query formulation, WYSIWYM and Language Generation. For the CLEF repository to be useful to scientists and clinicians, it must contain data that are easily understandable to the majority of envisaged users.The interface to the repository of health records and chronicles is being designed around techniques from WYSIWYM –'What you see is what you meant' supplemented by various visual or graphical presentations.The next stage of the project will include user studies to ensure that the interface meets users' priorities.

The overall approach in CLEF is based on 'ontology anchored knowledge bases'. Some of the required information exists in established resources, such as the UMLS3, however, much of it needs to be compiled as CLEF repository develops. CLEF works with both myGrid and the new COODE project to develop usable/accessible knowledge resources and tools.

Metadata in the Repository. The CLEF repository requires at least four types of metadata:

resource information
provenance information
usage and workflow information
annotations on certainty and evidence.

While 1-3 are analogous to metadata within myGrid (see next article) and related projects, the 4th is more specific to CLEF.

CLEF is anchored in five prominent UK universities - UCL and the Universities of Manchester, Sheffield, Brighton and Cambridge. Since CLEF is working alongside an ambitious government project (NPfIT) it needs to develop a close an interactive relationship with the key industry initiatives to ensure that its research focus in closely aligned with the clinical system developments. CLEF also aims to provide research and development work for NPfIT using its technological expertise in both addressing healthcare informatics problem and interpreting clinical problems in the context of informatics.

From CLEF to CLEF Services
The aim of CLEF Services, which will be launched in January 2005, is to expand the clinical base for its developmental work. That will enable CLEF to begin to work with live data sets and step-up it's testing processes for user friendliness and faithfulness of data extraction and interpretation in face of complex queries.

CLEF Services will also extend CLEF's ethical & security work with new partners, such as the Cathie March Centre at the University of Manchester and build closer links with myGrid, the GRID infrastructure and the NHS Care Record Service.

Please contact:
Adel Taweel, The University of Manchester, UK
Tel:+44 161 275 0659
E-mail: a.taweelmanchester.ac.uk