The INFOGENMED Project: A Biomedical Informatics Approach to Integrate Heterogeneous Biological and Clinical Information
by Ankica Babic, Victor Maojo, Fernando Martín-Sanchez, Miguel Santos and Antonio Sousa
Since the end of the 1990s, a growing number of researchers in Medical Informatics, Bioinformatics, Medicine and Biology are realizing of the advantages of linking genomic and medical data, knowledge, and methods to address new issues related to genomic medicine.
A conference carried out in Brussels, in 2001, supported by the European Commission, was the starting point for a series of European efforts to launch new initiatives in Biomedical Informatics. Since that event, many different approaches have been proposed with the aim of linking genomic and clinical data to support biomedical research and practice. One of these efforts has been the INFOGENMED project, carried out at the University of Aveiro, STAB VIDA (both Portugal), Linköping University (Sweden), and the Institute of Health Carlos III and Universidad Politecnica de Madrid (Spain). From 2002 to 2004, this project has been carried out and completed.
The INFOGENMED project has been designed to provide a unified access to multiple, heterogeneous biological and medical databases over Internet. The database integration module, named OntoFusion, is an ontology-based system designed for biomedical database integration. It is based on two processes: mapping and unification. Mapping is a process to link a database schema with a virtual schema. Virtual schemas are created using an existing ontology, such as UMLS or Gene Ontology or building a new domain ontology. In its current version, databases are mapped to virtual schemas at a conceptual level. Unification integrates ontologies and databases. Then, virtual schemas are unified, providing integrated access to the actual physical data. To our knowledge, OntoFusion is the first database integration system that uses a high-level ontology description language to represent the virtual schemas. Our system incorporates tools to edit ontologies and to build the virtual schemas as well as a graphical ontology navigator.
OntoFusion is also capable to redesign database schemas. Using its mapping tool, physical database schemas can be modified. For instance, using OntoFusion, two different physical schemas can be mapped to a common virtual schema with two concepts. Unification is then automatic.
The system can integrate both private and public databases. There are currently over 500 biological databases (DBs) publicly available. These databases are the result of many biological research projects that have produced an enormous amount of data about genes, proteins and genetic diseases. Often, different public DBs include related types of data. In other cases, different organizations store their own information - eg, gene polymorphisms and mutations - but no integration of this disparate information is carried out. Many of these databases do not offer a direct connection and inquiries are made by means of Web forms. We have used the system to integrate a large number of public biomedical databases, such as OMIM, PubMed, Enzyme, Prosite and Prosite documentation, PDB, SNP, InterPro and others.
Another important result of the project has been the development of an assistant to help health practitioners to seamlessly navigate through local and remote Internet resources related to genetic diseases, from phenotype to genotype. A navigation protocol (a workflow for accessing public databases available on the web) was created by skilled users, familiar in retrieving information associated to rare diseases, both medical and genomic data. Based upon this protocol it was developed a web-based portal (DiseaseCard) that optimizes the execution of the information gathering tasks specified on the protocol.
One of the main future challenges of the system is to design the specific applications that can be useful to both biological and medical researchers and practitioners. Whereas, for instance, genomic researchers are common users of public Web-based databases, clinicians will need access to these kind of new information resources in order to fulfill the expectations of genomic medicine.
Antonio Sousa, University of Aveiro, Portugal
Tel: +35 1 234 370 500
Polytechnical University of Madrid, Spain
Tel: +34 91 336 7447
Institute of Health Carlos III, Madrid, Spain
Tel: +34 91 822 3219