ERCIM News No.48, January 2002 [contents]
Spatial Data Mining Platform based on Enterprise Java Beans
by Michael May and Alexandr Savinov
The rapidly expanding market for data mining and Geographic Information Systems (GIS) technologies is driven by pressure from the public sector, environmental agencies and industry to provide innovative solutions to a wide range of problems. The main objective of the SPIN! project is to offer new possibilities for the analysis of geo-referenced data. The SPIN! Spatial Data Mining System integrates state-of-the-art Geographic Information Systems and data mining functionality in an open, highly extensible, internet-enabled architecture based on Enterprise Java Beans.
So far, data mining and Geographic Information Systems (GIS) have existed as two separate technologies, each with its own methods, traditions and approaches to visualisation and data analysis. Particularly, most contemporary GIS have only very basic spatial analysis functionality. Many are confined to analysis that involves descriptive statistical displays, such as histograms or pie charts.
Data mining, which is the partially automated search for hidden patterns in large databases, offers great potential benefits for applied GIS-based decision-making. Recently, the task of integrating these two technologies has become critical, especially as various public and private sector organisations possessing huge databases with thematic and geographically referenced data begin to realise the huge potential of the information hidden there. Among those organisations are:
As a response to this demand a promising prototype has been developed which demonstrates the potential of combining data mining and GIS. This initial prototype encouraged the formation of the SPIN! project, which is funded by the European Commission under IST-10536-SPIN! The coordinator is the Fraunhofer AIS, and the partners are Univ. Bari; GeoForschungszentrum Potsdam; Univ. Leeds; Univ. Manchester, Manchester Metropolitan Univ.; Professional GeoSystems, Amsterdam; and Russian Academy of Sciences, Moscow. The overall objective of the SPIN! project consists in developing a web-based spatial data mining system by integrating state-of-the-art Geographic Information Systems (GIS) and data mining functionality in a closely coupled, open and extensible system architecture. Thus the new generation SPIN! system pays special attention to such features as scalability, security, multi-user access, robustness, platform independence and adherence to standards.
The general SPIN! architecture is shown in the Figure. It is an n-tier Client/Server architecture based on Enterprise Java Beans for the server-side components. It has the following major sub-systems:
The client is a GUI Java application or applet. Clients can access the server by using RMI (or by HTTP/Servlets). Thus the system can work in Intra- and Internets. The application server is an Enterprise Java Bean container. It manages the client workspace, analysis and visualisation tasks, data access and persistency. User data are stored in primary data storage, which is a relational database system (it may be the same machine as the application server). There may be one or more optional secondary databases for analysis data and/or workspaces. In addition, data can be loaded from other sources databases, ASCII files in the file system or Excel files. Analysis tasks can run on one or more compute servers (it may be the same machine as the application server). The client creates one remote object for each analysis task to be run so that data is transferred directly from the database to the algorithm. After the analysis is finished, the result is transferred to the client for visualisation. A connector machine, a Java Virtual Machine running on the application server, is used for accessing non-Java analysis tasks. These may run on additional compute servers.
The first prototype of the SPIN! spatial mining system has been implemented using the Java 2 platform on both the client and the server sides. The EJB components run on a Borland Application Server. One analysis algorithm EJB component uses a Java Native Interface to call procedures in a dynamically linked library. We used Oracle 8.1.7 as the database, which is accessed by means of JDBC drivers. The choice of EJB technology has allowed us to meet the requirements for web-based dissemination of census data, eg, security, scalability and platform independence, in a principled manner. The system is tightly integrated with a relational database and can serve as a data access and transformation tool for spatial and non-spatial data.