Self-Optimization in a Next-Generation Urban Traffic Control Environment

by Raymond Cunningham, Jim Dowling, Anthony Harrington, Vinny Reynolds, René Meier and Vinny Cahill

The Urban Traffic Control Next Generation (UTC-NG) project tackles the problem of global optimization in next-generation UTC systems by using sensor data to drive a fully decentralized optimization algorithm.

Current approaches to Urban Traffic Control (UTC) typically take hierarchical and/or centralized approaches to the global optimization of traffic flow in an urban environment, often with limited success. By exploiting the increasing amounts of available sensor data (eg from inductive loops, traffic cameras, on-board GPS systems etc), next-generation UTC system designers possess a unique opportunity to address the problem of global optimization of traffic flows.

Recent advances in sensor technology have made online vehicle and traffic flow detection possible. This in turn has enabled adaptive traffic control systems, capable of online generation and implementation of signal-timing parameters. Adaptive control systems are widely deployed throughout the world.

However, the adaptive traffic-control systems that are currently deployed are hampered by the lack of an explicit coordination model for traffic-light collaboration, and are typically reliant on previously specified models of the environment that require domain expertise to construct. These models are typically used as an input to both sensor data interpretation and strategy evaluation, and may often be too generic to adequately reflect highly dynamic local conditions.
These systems have a limited rate of adaptivity and are designed to respond to gradual rather than rapid changes in traffic conditions. They employ centralized or hierarchical data processing and control algorithms that do not reflect the localized nature of fluctuations in traffic flow.

Collaborating Traffic Lights
An alternative approach to the one pursued by existing UTC systems is to allow the controller/agent of the set of traffic lights at a junction to act autonomously, deciding on the appropriate phase for the junction. The actions available to such an agent are similar to the those available to a traffic manager in a centralized/hierarchical UTC system (ie remaining in the current phase or changing to another available phase).

In a similar manner to existing centralized/hierarchical UTC systems, the agent would monitor the level of congestion at the junction under its control based on available sensor data and use this information to decide which action to take. Over time, the agent learns the appropriate action to take given the current level of congestion. However, if the agent at a junction simply optimizes its behaviour using only local congestion information at that junction, this may result in locally optimal performance but also in suboptimal overall system performance.

In order to achieve optimal system-wide performance, the set of agents at traffic light junctions in the UTC system should communicate their current status to agents at neighbouring upstream and downstream junctions. These can then utilize this information when choosing the appropriate action to take. By operating in this completely decentralized way, the UTC system obviously becomes self managing and can, less obviously, be designed to optimize the global flow of vehicles through the system. The technique used to achieve this decentralized optimization through coordination/collaboration is called Collaborative Reinforcement Learning (CRL).

Distributed Optimization
CRL is a decentralized approach to establishing and maintaining system-wide properties in distributed systems. CRL extends Reinforcement Learning (RL) by allowing individual agents to interact with neighbouring agents by exchanging information related to the particular system-wide optimization problem being solved. The goal of CRL is to enable agents to produce collective behaviour that establishes and maintains the desired system-wide property.
Optimizing the global flow of traffic in a UTC system can be considered as a single system-wide problem. This can be decomposed into a collection of discrete optimization problems, one at each traffic light junction in the UTC system.

Since each traffic light agent has a subproblem that is unique to that agent, a traffic light agent cannot delegate the solution of this problem to one of its neighbours. Rather, the agent must attempt to solve its problem locally by devising an appropriate policy. However, as the solution to the problem depends on local traffic conditions that vary over time, the traffic light must continually attempt to estimate/learn the optimal policy for the junction under its control.

In a similar approach, UTC-CRL is taking an experimental approach to validating the appropriateness of CRL in a large-scale UTC setting. In particular, an objective of the UTC-CRL experimental approach is to verify that a consensus can emerge between collaborating traffic light agents, and that this consensus allows optimal traffic flow in the large-scale setting. The envisioned setting for this work corresponds to the Dublin city area, which consists of 248 traffic light junctions, over 750 non-traffic light junctions, and over 3000 links between these junctions.

The UTC-NG project is supported by the TRIP project, a multi-disciplinary research centre funded under the Programme for Research in Third-Level Institutions (PRTLI), administered by the Higher Education Authority.

Please contact:
Raymond Cunningham, Trinity College Dublin / IUA, Ireland
Tel: +353 1 608 2666