Using a monochrome surveillance CCD camera the hands are extracted based on the hand grey-levels within the high contrast images. The second component is hand tracking, which is a significant problem due to the presence of hand-hand occlusion. When one hand covers the other partially or completely, they must be re-acquired correctly at the end of the occlusion period.
Studies in neuroscience show that the two hands are temporally and spatially coordinated in bimanual movements. In addition, the components of one hand are temporally coordinated too. These coordinations form the basis of our algorithm to track the hands in bimanual movements.
We have taken a general view of the tracking problem to cover many challenging problems in this area. For example, from a pure pattern recognition point of view a movement can be understood differently when it is seen from different camera view directions. By defining a general set of movement models independent of view angle we have developed the tracking algorithm so that it covers almost every camera view direction. It is trained in just one direction and can be used in other directions. This makes the algorithm independent of the position of the visual system.
Using the temporal coordinations both between limbs (the two hands) and within a limb (a hand and the fingers) the algorithm tracks the hands independent of the hand shapes even in movements where the shapes change. This is especially important from the processing speed point of view. Since processing and understanding the hands shapes is usually a time consuming process, as a component of an integrated real-time recognition system, the tracking algorithm must be fast enough to leave enough room for the other components.
The view-direction and hand-shape independence naturally lends itself to extending the concept of tracking towards mobile vision environments (eg active vision in robotics). We have developed a model to make the algorithm independent from the actual position and velocities. Consequently, it can be used in applications where the visual system (the camera) moves or turns. For example, assuming that the camera is installed on a humanoid robot, the algorithm tracks the hands of a subject while the robot walks.
The third component of the system is the recogniser. As a hierarchical cognitive system, it analyses the hand shapes at the bottom level, learns the individual partial movement of each hand at the intermediate level, and combines them at the top level to recognise the whole movement (see Figure 2). Statistical and spatio-temporal pattern recognition methods such as Principal Component Analysis and Hidden Markov Models form the bottom and intermediate levels of the system. A Bayesian inference network at the top level perceives the movements as a combination of a set of recognised partial hands movements.
The recogniser has been developed so that it learns single movements and recognises both single and concatenated periodic bimanual movements. The concatenated periodic bimanual movements are used particularly in Virtual Reality simulators for interacting with virtual environments. A virtual spacecraft controlled by bimanual gestures is an example.
In all parts of this research we have looked at the problems from the general point of view and developed general solutions. The tracking algorithm can be employed in a wide range of applications including recognition, Virtual Reality, and surveillance/security systems. The recogniser can be used in recognising both single and concatenated periodic bimanual movements.
Our plan for the future is to make the recognition component independent from the camera view direction. This will result in a system that can recognise the movements from the view directions that has not been trained for. Results of the ongoing research in this area will open significant doors towards the general learning and understanding of human movements.
Alistair Sutherland, Dublin City University