Automatic Fusion and Interpretation of 2D and 3D Sensor Data
by Dmitry Chetverikov
The Geometric Modelling and Computer Vision (GMCV) Laboratory of SZTAKI is working on several projects related to the automatic fusion and high-level interpretation of 2D and 3D sensor data for building rich models of real-world objects and scenes.
The Geometric Modelling and Computer Vision Laboratory conducts research in various areas related to computer vision, image processing, pattern recognition, reverse engineering, computer representation and graphic visualisation of various objects, computational and algorithmic issues of complex curves, surfaces and volumetric objects. The computer vision research is carried out by the Image and Pattern Analysis (IPAN) group of the GMCV lab. The main areas of interest are shape, texture and motion analysis, as well as stereo vision.
A major research goal of the GMCV lab is to build photorealistic 3D models based on multimodal sensor data. We process and combine sensor data from various physical origins, such as camera, laser scanner and CT images, to obtain rich and geometrically correct models of real-world objects and scenes.
In our view, a photorealistic model has three major components - geometry, appearance and dynamics - which must satisfy the following requirements: precision, continuity, high-level description (geometry), texture, realistic surface models, presentation at varying levels of detail (appearance), motion and deformable shapes (dynamics).
To achieve this, we develop, implement and test algorithmic tools for the fusion of 2D and 3D data, feature detection in images and shapes, segmentation of images, curves, point sets and surfaces, classification and matching, surface and curve fitting, and morphing. In our research, we pay particular attention to the following three critical issues: bridging the gap between geometric modelling and computer vision, achieving robustness against noise and outliers and creating efficient and flexible data structures. (A characteristic example of the gap between GM and CV is reverse engineering, where vision still cannot provide 3D data precise enough for accurate geometric modelling.)
Our current projects are illustrated by a few typical results. Accurate and robust 3D data registration is an important step in reverse engineering and 3D medical image processing. In reverse engineering, the data sets consist of partially overlapping measurements of an object, usually produced by a 3D laser scanner. Figure 1 shows the registration result for a surface represented by more than 100,000 points, while Figure 2 illustrates high-level interpretation of measured data as the final outcome of a long chain of reverse engineering algorithms. In a large-scale national medical project, we have obtained, registered and segmented CT data of various modalities to build a new model of the human knee for knee surgery and prosthesis development.
|Figure 1: Robust automatic registration (fusion) of partial measurements.
||Figure 2: Reverse engineering is high-level interpretation of measured 3D data.
|Figure 3: Finding and matching high-level structural features in a wide-baseline stereo pair.
||Figure 4: Surface reconstruction from the wide-baseline pair.
A related project run by IPAN is devoted to scene reconstruction from multiple points of view. Special attention is paid to the so-called wide-baseline stereo, where two representations of a scene differ significantly due to divergent viewing angles. Corresponding features of the images therefore have significantly different positions and are subject to affine distortion. This basic problem is often complicated further by occlusions. For these reasons, a critical step of the reconstruction process entails establishing the initial (sparse) correspondences and building the epipolar geometry.
In our approach, we automatically detect and identify (match) the corresponding high-level structural features of a wide-baseline stereo pair, as illustrated in Figure 3. The structural features considered are the dominant, compact periodic structures, known as periodic distinguished regions, or PDRs.
The initial correspondence between the PDRs is not sufficiently precise to build an accurate epipolar geometry. However, it can be used to obtain a rough affine alignment of the images, in which corresponding local features, such as Harris corners, are much closer to each other than in the initial pair. Once this has been done, a conventional, close-range feature matching procedure is used to find a sufficient number of precise correspondences.
Based on the epipolar geometry, we rectify the two images and apply a recently developed dense matching procedure based on affine region growing. The procedure accounts for affine distortion of the local features that is typical for wide-baseline stereo images. Figure 4 shows an example of surface reconstruction from a wide-baseline pair. Due to the occlusion, some regions are missing, which is typical for this task. Comparison with conventional dense matching techniques shows visible improvement in the quality of the reconstruction.
Building photorealistic 3D models based on multimodal sensor data is a challenging research area. Despite the significant progress made in the last few years, many critical issues remain open. Solving the basic problems described here will open the way to seamless integration of computer vision, geometric modelling and computer graphics, and to the creation of next-generation, high-level photorealistic models. We hope our results will contribute to achieving this ambitious goal.
Dmitry Chetverikov, SZTAKI
Tel: +36 1 209 6510