ERCIM News No.33 - April 1998

Sir Arthur - Learning to walk on six Legs

by Frank Kirchner

The objective of the Sir Arthur project is to demonstrate the application of Reinforcement Learning to a multi-degree of freedom walking machine. It is shown how learning is used at the elementary level for basic swing and stance motion, required to achieve the simple walking of one leg. Following this, a more complex, higher level behaviour is learned involving the coordination of each leg to achieve an overall forward motion. Finally, these behaviours are applied to a goal orientated objective where Sir Arthur is required to find areas of maximum light intensity.

16 motors are used to provide swing and stance for each of the six legs and two degrees of freedom for each of the two joints. This allows Sir Arthur to lift its head and look around to re-orientate relative to a light source. The forward movement on six-leg machines is achieved by swinging three legs to the front while at the same time performing a stance movement on the others.

Sir Arthur is build in three segments: Each segment comprises two legs. The segments are connected by two actively controllable degree-of-freedom joints. The joints enable Sir Arthur to raise and pivot the front and backsegments, thereby allowing for 3D movements. On each segment, there is a microprocessor to perform the basic control tasks. An additional microcontroller, the master controller is installed in the middle segment to perform higher level processing tasks. The master and the slave controller use a bidirectional 8 bit parallel link for communication. The master sends commands to the slaves to perform elementary swing and stance behaviours while it receives status information and global sensor data.

Reinforcement Learning was implemented to learn a trajectory in the state space that is formed by the possible positions of the legs. The reinforcement to the robot is computed from the difference between a reference trajectory and the currently performed movement. Using this reinforcement, the system learns to perform the correct sequence of actions to swing and stance one leg.

More difficult to learn is the coordination of the individual legs to achieve a more complex behaviour such as walking forward. In this case, Reinforcement Learning was also used to accomplish this task. A technique called Experience Replay was used to speed up the learning. Experience Replay stores successful sequences and uses them as a bias during later learning stages.

After long sequences of unsuccessful trials, first signs of coordinated, though still wrong, behaviour can be observed. As coordination increases, the robot performs more and more forward steps until the correct coordination scheme is finally found.

The most interesting applications for autonomous mobile robots are those which require a high kinematic complexity of the robot. Examples are the exploration of remote planets, the inspection of rough terrain, destroyed or contaminated areas. However, the complex kinematics of the robot require methods to control the system which are not offered by classic control theory. Methods provided by Artificial Intelligence are not only useful to solve such problems; currently, they are the only ones which can be applied successfully. Information on Sir Arthur on the Web at:

Please contact:

Frank Kirchner - GMD
Tel: +49 2241 14 2402

return to the contents page