Data Mining in Children's Hypnograms
by András Lukács and László Lukács
The Informatics Laboratory of SZTAKI, the Eötvös Loránd University, Budapest, and the Madarász Street Children's Hospital, Budapest, are collaborating in infant sleep research, using state-of-the-art data mining techniques to improve the efficiency of diagnostic activity of the hospital's Sleep Laboratory and to conduct basic research in physiology.
Our project aims primarily at developing a diagnostic tool for infant and child breathing anomalies during sleep. The tool is intended for clinical use, while also addressing the needs of basic research in physiology. In order to find out whether children are threatened by breath-disorders during their sleep, expensive and time-consuming polysomnographic examinations are carried out. We intend to develop a diagnostic software which can expose the signs of danger even if no or only few apnoeas are found in the recording, thus making diagnose less expensive and more accurate.
Adult sleeping disorders are much more examined and known than those of children, although it is evident that in some aspects they are very different. The main breathing problem of adults is snoring, mainly caused by overweight or hypotonicity, whereas the childrens' main problem is apnoea which may be caused by disturbance of development or diseases of the gullet or the nervous system. The polysomnographic data analysed in our project are collected in the Sleep Laboratory of the Madarász Street Children's Hospital. The Sleep Laboratory has been working for seven years and all routinely collected diagnostic data have been archived. There are both data of healthy and diseased children in the dataset, which are also provided by the original diagnoses.
So far, data recorded at night are searched manually for apnoeas. The diagnosis is based on the number and type of apnoeas found. Because this approach takes into account only a very small part of the information collected, better visualization and automatic feature extraction tools are needed to improve diagnostics and to incorporate all possible information in decision making. Setting up standards for children's sleep using data mining techniques will make it easier and more accurate to point out the problematic phenomena in the sleep recordings.
The analysis is carried out at the Data Mining and Web Search Group, Informatics Laboratory of SZTAKI in co-operation with the Department of Physiology and Neurobiology at Eötvös Loránd University. The project started three months ago and is in its initial phase. We are building a unified database from the collected data and are creating the algorithmic tools needed to extract features from the huge amount of raw data. Input data consist of 6-8 hours long continuous polysomnographic records on 10-15 channels, which we analyse breath-to-breath and calculate descriptors for each breath-cycle. These descriptors include the time elapsed since the beginning of the sleep, length and shape of the breath-cycles measured in different channels, distribution of the lengths of the heartbeats, muscle-tone, intensity and frequency of the eye movements, and the frequency distribution of the EEG. The framework is flexible, any further descriptors suggested can be included into the model.
We use several data mining techniques to extract knowledge from this derived data set. In the individual records we use clustering algorithms to find sleep phases, and sequence mining to explore typical changes. We find patterns emerging from the sequence of breaths on two scales. First, a large scale analysis targets the total sleep-duration, and roughly identifies the sleep phases (see Figure), and then a detailed analysis finds specific changes in the individual phases. Differences among the different sleep periods provide important sources of information, as both cardiovascular problems and apnoeas occur most often close to the end of the sleep. The local analysis is done at the level of dozens of breaths close to the transitions between the sleep phases in order to reveal the order of quick events around these changes.
Our preliminary results show that data mining techniques are suitable tools to reveal an internal structure of these multi-channel recordings and to find characteristic features identified by physiologists and physicians; ie they can be used to extract useful information from this huge array of individual data. Promising findings show that pathological changes in the record could also be found and identified by software based on data mining techniques. Using these patterns, we can automatically find anomalies in the sleep records even if there are no apnoeas present. This tool can easily be implemented in clinical practice. Our aim is to offer the expert a comprehensive set of information derived from the recording and combine them in the most suitable way.
András Lukács, SZTAKI,Hungary
Tel: +36-1-279 6169
Eötvös Loránd University, Hungary