The Astro-Wise System: A Federated Information Accumulator for Astronomy

by Edwin A.Valentijn and Gijs Verdoes Kleijn

The progress of astronomy is about to hit a wall in terms of the processing, mining and interpretation of huge datasets. The Astro-Wise consortium has designed and implemented a fully scalable and distributed information system to overcome this problem for wide-field imaging. The same principles can be applied to other sciences.

Much of modern research involves the accumulation of huge amounts of digitized data. The analysis of this data by distributed communities represents a significant challenge to project management and ICT implementation, and is relevant to fields as diverse as biology, physics, astronomy, economics and cultural heritage projects. Furthermore, the projects are often global efforts requiring collaborators in many places to share, validate and combine processed data and derived results. It is therefore necessary to develop more efficient data lineage, mining and analysis systems to allow researchers to search intelligently through previously unmanageable volumes of data.

The Astro-Wise consortium has developed an information system to meet these challenges for wide-field imaging in astronomy. The Astro-Wise consortium is a partnership between OmegaCEN-NOVA/Kapteyn Institute (Groningen, The Netherlands; coordinator), Osservatorio Astronomico di Capodimonte (Naples, Italy), Terapix at IAP (Paris, France), ESO, Universitäts-Sternwarte & Max-Planck Institut für Extraterrestrische Physik (Munich, Germany).

Large data projects in high-energy physics, space missions and astronomy typically push data through various platforms in an irreversible way (eg a TIER node setting). In such a situation, the end user has little or no influence on what happens upstream. This ‘classical’ paradigm is characterized by fixed ‘releases’ of homogeneous, well-documented data products. In contrast, the Astro-Wise system allows the end user to trace the data product, following all its dependencies up to the raw observational data and, if necessary, to re-derive the result with better calibration data and/or improved methods.

This improvement is achieved by:

The database with all metadata and catalogues provides the infrastructure to develop tools for a variety of purposes. These include rapid trend analysis of data, complex queries and fast hunting for ‘needles in the haystack’ of Terabyte-sized catalogues. Thus, the system provides the user with fully integrated, transparent access to all stages of the data processing and thereby allows the data to be reprocessed and the system to be improved and expanded.
For a given project/instrument, the system initially starts in a naive, ‘quick look’ mode, which gradually improves as various researchers add refined information to the system under the supervision of project leaders. Approved calibration modifications automatically become public, beyond the project boundaries. A mechanism for quality control is implemented which allows for changes due to one of:

The core of the system exploits three properties in database environment. First, we apply the principle of inheritance using Object Oriented Programming (Python), where all Astro-Wise objects inherit key properties for database access, such as persistency of attributes. Second, the linking (associations or references) between instances of objects in the database is completely maintained, and for each bit of information, it is possible to trace those bits of information that were used to obtain it. Third, each step, and the inputs used for it, is kept within the system. The database grows constantly through the addition of new information or improvements made to existing information.

Figure 1
A 256 Mega pixel test image of the OmegaCAM instrument, which consists of 32 eight Megapixel CCDs.

All system components are distributed over Europe, enabling research groups to collaborate on shared projects. Knowledge added by one group is immediately accessible by others via a Web portal, which includes data viewing, quality labelling and compute-services (see links). Currently, researchers use the Astro-Wise system with 10 Tbyte of astronomical images. Hundreds of Tbytes of data will start entering the system when the OmegaCAM panoramic camera starts operations in Chile. This camera is dedicated to various large surveys using the Astro-Wise system.

Astro-Wise coordinator OmegaCEN-NOVA is collaborating with the LOFAR consortium and CWI to explore usage of the Astro-Wise system for LOFAR, the next generation Low Frequency Array of radio telescopes, which is being built in the Netherlands and Germany. Astro-Wise can also be applied to other fields of science. The object-oriented use of the database allows for classes of objects dealing with arbitrary forms of digitized observational data. Scans of cultural heritage, DNA sequences, data from high-energy particle collisions or financial markets can be processed using similar principles to the images of the sky.


