Descartes and Kepler for Spatial Data Mining
by Natalia Andrienko, Gennady Andrienko, Alexandr Savinov and Dietrich Wettschereck
Geographic Information Systems (GIS) and Knowledge Discovery in Databases (KDD) have so far been developed as two separate technologies. Recently, as organizations have accumulated huge databases with a high percentage of geographically referenced data, they begin to realize the huge potential of information hidden there. The task of applying data mining technologies to geographic information systems is therefore now becoming extremely relevant. Current results of integrating the geographic information (analysis) system Descartes and the data mining tool Kepler are very promising.
Our system Descartes provides unique features (i) for intelligent mapping support and (ii) a full spectrum of functions for interactive visual analysis of spatially referenced data (see Descartes system: Interactive Intelligent Cartography in Internet by Gennady Andrienko and Natalia Andrienko in ERCIM News, July 1998). Thus, Descartes automates the generation of maps presenting user-selected data, and it supports various interactive manipulations of map displays that can help to reveal important features of the spatial distribution of data. Descartes also supports some data transformations effective for visual analysis, and supports the dynamic calculation of derived variables by means of logical queries and arithmetic operations over existing variables.
Our Kepler data mining system provides an easy-to-use, flexible, and powerful platform incorporating a number of data mining methods. It is an open platform by supplying a universal plug-in interface for adding new methods. Kepler supports the whole data mining process including tools for data input and format transformation, access to databases, querying, management of (intermediate) results, and graphical presentations of various kinds of data mining results (trees, rules, and groups). In great extent, both systems are designed to serve the same goal: to help to get knowledge about data but provide complementary instruments with a high potential for synergy.
Integrating the Tools - New Generation of Spatial Data Analysis
To further support the analysis of spatially referenced data we realized a first link between Kepler and Descartes, thus integrating traditional data mining instruments with interactive cartographic visualization tools. The basic idea is that an analyst can view both source data and results of data mining processes in the form of maps and statistical graphics that convey spatial information in a natural way. The analyst can thus much easier detect spatial relationships and patterns.
Conceptually the integrated system combines three kinds of links:
- from geography to mathematics: when visually exploring and manipulating a map, the user may detect some spatial phenomenon; he may then try to find an explanation or justification for this by applying data mining methods
- from mathematics to geography: data mining methods produce results that are then visually presented and analyzed on maps
- dialogue between mathematics and geography (linked displays): graphics representing results of data mining in the usual (non-cartographic) form are viewed in parallel with maps, and dynamic highlighting visually connects corresponding elements in both types of displays.
The integrated system has a client-server architecture. The server is implemented in C++ (Descartes) and Prolog (Kepler), the client in Java. The system is available for Windows and Unix platforms. Product version of Descartes and Kepler are available from Dialogis Software & Services GmbH.
Descartes examples to try out: http://allanon.gmd.de/and/java/iris/
Information about Kepler and Dialogis: http://www.dialogis.de/
Homepage of the research group: http://ais.gmd.de/KD/
and Alexandr Savinov - GMD
Tel: +49 2241 14 2486/2629
E-mail: firstname.lastname@example.org, email@example.com