< Contents ERCIM News No. 55, October 2003
SPECIAL THEME: Machine Perception

Learning and Understanding Bimanual Movements

by Atid Shamaie and Alistair Sutherland

Scientists at Dublin City University have researched a subset of human movements called bimanual movements. At different stages of this research they have approached the problems from the novel points of view. They believe that many machine learning problems can accommodate neuroscience and perceptual aspects of human movements for learning and recognising human behaviours.

Learning and recognising human movements have been given great attention of researchers around the world in the recent years. A broad range of applications from medicine to surveillance and security can benefit from this technology. Learning hand movements and recognising gestures are significant components of such technologies.

Bimanual movements in general form a large subset of hand movements in which both hands move simultaneously in order to do a task or imply a meaning. Clapping, opening a bottle, typing on a keyboard and drumming are some usual bimanual movements. Sign Languages also use bimanual movements to accommodate sets of gestures for communication.

Due to the involvement of both hands, understanding bimanual movements requires not only computer vision and pattern recognition techniques but also neuroscientific studies as a background to perceive the movements.

A cognitive system for bimanual movements learning and understanding entails three fundamental components (see Figure 1): low-level image processing to deal with sensory data, intelligent hand tracking to recognise the left hand from the right hand, and machine learning for understanding the movements.

Figure 1: A bimanual movement recognition system.

Figure 2: The recognition system.

Using a monochrome surveillance CCD camera the hands are extracted based on the hand grey-levels within the high contrast images. The second component is hand tracking, which is a significant problem due to the presence of hand-hand occlusion. When one hand covers the other partially or completely, they must be re-acquired correctly at the end of the occlusion period.

Studies in neuroscience show that the two hands are temporally and spatially coordinated in bimanual movements. In addition, the components of one hand are temporally coordinated too. These coordinations form the basis of our algorithm to track the hands in bimanual movements.

We have taken a general view of the tracking problem to cover many challenging problems in this area. For example, from a pure pattern recognition point of view a movement can be understood differently when it is seen from different camera view directions. By defining a general set of movement models independent of view angle we have developed the tracking algorithm so that it covers almost every camera view direction. It is trained in just one direction and can be used in other directions. This makes the algorithm independent of the position of the visual system.

Using the temporal coordinations both between limbs (the two hands) and within a limb (a hand and the fingers) the algorithm tracks the hands independent of the hand shapes even in movements where the shapes change. This is especially important from the processing speed point of view. Since processing and understanding the hands shapes is usually a time consuming process, as a component of an integrated real-time recognition system, the tracking algorithm must be fast enough to leave enough room for the other components.

The view-direction and hand-shape independence naturally lends itself to extending the concept of tracking towards mobile vision environments (eg active vision in robotics). We have developed a model to make the algorithm independent from the actual position and velocities. Consequently, it can be used in applications where the visual system (the camera) moves or turns. For example, assuming that the camera is installed on a humanoid robot, the algorithm tracks the hands of a subject while the robot walks.

The third component of the system is the recogniser. As a hierarchical cognitive system, it analyses the hand shapes at the bottom level, learns the individual partial movement of each hand at the intermediate level, and combines them at the top level to recognise the whole movement (see Figure 2). Statistical and spatio-temporal pattern recognition methods such as Principal Component Analysis and Hidden Markov Models form the bottom and intermediate levels of the system. A Bayesian inference network at the top level perceives the movements as a combination of a set of recognised partial hands movements.

The recogniser has been developed so that it learns single movements and recognises both single and concatenated periodic bimanual movements. The concatenated periodic bimanual movements are used particularly in Virtual Reality simulators for interacting with virtual environments. A virtual spacecraft controlled by bimanual gestures is an example.

In all parts of this research we have looked at the problems from the general point of view and developed general solutions. The tracking algorithm can be employed in a wide range of applications including recognition, Virtual Reality, and surveillance/security systems. The recogniser can be used in recognising both single and concatenated periodic bimanual movements.

Our plan for the future is to make the recognition component independent from the camera view direction. This will result in a system that can recognise the movements from the view directions that has not been trained for. Results of the ongoing research in this area will open significant doors towards the general learning and understanding of human movements.

Link:
http://www.computing.dcu.ie/~ashamaie/mvg/

Please contact:
Atid Shamaie, Dublin City University
Tel: +353 1700 8449
E-mail: atid@computer.org

Alistair Sutherland, Dublin City University
Tel: +353 1700 5511
E-mail: alistair@computing.dcu.ie