Reinforcement Learning for Traffic Signal Control

The aim of this website is to offering comprehensive dataset, simulator, relevant papers, tutorial and survey to anyone who may wish to start investigation or evaluate a new algorithm.

Table of contents

Tutorial

Deep Reinforcement Learning for Traffic Signal Control

IEEE ITSC 2020
[Slides] [Supplimentary codes]
In this tutorial, we first introduce the formulation of traffic light control problems under RL, and then classify and discuss the current RL control methods from different aspects: agent formulation, policy learning approach, and coordination strategies. In the third section, we provide hands-on experience on fast developement on different RL methods for traffic signal control. We then discuss some future research directions.


Key Paper List

Overview Slides
Research Problems: All papers/ How to design reward?/ How to learn faster?/ How to build a real simulator? How to transfer from one environment to another?
Control Scenarios: Single intersection/ Multiple intersections

Prompt to Transfer: Sim-to-Real Transfer for Traffic Signal Control with Prompt Learning

AAAI'24
Highlight: Improving sim-to-real transfer performance with Large Language Models
[code]

In this work, we leverage LLMs to understand and profile the system dynamics by a prompt-based grounded action transformation. Accepting the cloze prompt template, and then filling in the answer based on accessible context, the pre-trained LLM’s inference ability is exploited and applied to understand how weather conditions, traffic states, and road types influence traffic dynamics, being aware of this, the policies’ action is taken and grounded based on realistic dynamics, thus help the agent learn a more realistic policy.


Uncertainty-aware Grounded Action Transformation towards Sim-to-Real Transfer for Traffic Signal Control

CDC'23
Highlight: Improving sim-to-real transfer performance with uncertainty quantification
[code]

In this paper, we propose a simulation-to-real-world (simto-real) transfer approach called UGAT, which transfers a learned policy trained from a simulated environment to a realworld environment by dynamically transforming actions in the simulation with uncertainty to mitigate the domain gap of transition dynamics.


Toward A Thousand Lights: Decentralized Deep Reinforcement Learning for Large-Scale Traffic Signal Control

AAAI'20
Highlight: A combination of PressLight and FRAP
[demo] [poster] [code]

In this paper, we tackle the problem of multi-intersection traffic signal control, especially for large-scale networks, based on RL techniques and transportation theories. This problem is quite difficult because there are challenges such as scalability, signal coordination, data feasibility, etc. To address these challenges, we (1) design our RL agents utilizing ‘pressure’ concept to achieve signal coordination in region-level; (2) show that implicit coordination could be achieved by individual control agents with well-crafted reward design thus reducing the dimensionality; and (3) conduct extensive experiments on multiple scenarios, including a real-world scenario with 2510 traffic lights in Manhattan, New York City.


CoLight: Learning Network-level Cooperation for Traffic Signal Control

CIKM'19
Highlight: Attention-based coordination
[code] [poster]

To enable cooperation of traffic signals, in this paper, we propose a model, CoLight, which uses graph attentional networks to facilitate communication. Specifically, for a target intersection in a network, CoLight can not only incorporate the temporal and spatial influences of neighboring intersections to the target intersection, but also build up index-free modeling of neighboring intersections. To the best of our knowledge, we are the first to use graph attentional networks in the setting of reinforcement learning for traffic signal control and to conduct experiments on the large-scale road network with hundreds of traffic signals.


PressLight: Learning Max Pressure Control to Coordinate Traffic Signals in Arterial Network

KDD'19
Highlight: Pressure-based coordination
[code] [demo] [poster]

To avoid the heuristic design of RL elements, we propose to connect RL with recent studies in transportation research. Our method is inspired by the state-of-the-art method max pressure (MP) in the transportation field. The reward design of our method is well supported by the theory in MP, which can be proved to be maximizing the throughput of the traffic network, i.e., minimizing the overall network travel time. We also show that our concise state representation can fully support the optimization of the proposed reward function. Through comprehensive experiments, we demonstrate that our method outperforms both conventional transportation approaches and existing learning-based methods.


Learning Phase Competition for Traffic Signal Control

CIKM'19
Highlight: Model the competition between pair of phases
[code] [poster]

In this paper, we propose a novel design called FRAP, which is based on the intuitive principle of phase competition in traffic signal control: when two traffic signals conflict, priority should be given to one with larger traffic movement (i.e., higher demand). Through the phase competition modeling, our model achieves invariance to symmetrical cases such as flipping and rotation in traffic flow. By conducting comprehensive experiments, we demonstrate that our model finds better solutions than existing RL methods in the complicated all-phase selection problem, converges much faster during training, and achieves superior generalizability for different road structures and traffic conditions.


MetaLight: Value-based Meta-reinforcement Learning for Online Universal Traffic Signal Control

AAAI'20
Highlight: Meta learning for universal traffic signal control
[code] [poster]

In this paper, we propose a novel framework, named as MetaLight, to speed up the learning process in new scenarios by leveraging the knowledge learned from existing scenarios. MetaLight is a value-based metareinforcement learning workflow based on the representative gradient-based meta-learning algorithm (MAML), which includes periodically alternate individual-level adaptation and global-level adaptation. Moreover, MetaLight improves thestate-of-the-art reinforcement learning model FRAP in traffic signal control by optimizing its model structure and updating paradigm.


Learning Traffic Signal Control from Demonstrations

CIKM'19
Highlight: Learning from expert demonstrations
[code] [poster]

To avoid the prominent exploration problem in RL-based traffic signal control methods, we make an analogy between agents and humans. Agents can learn from demonstrations generated by traditional traffic signal control methods, in the similar way as people master a skill from expert knowledge. Therefore, we propose DemoLight, for the first time, to leverage demonstrations collected from lassic methods to accelerate learning. Based on the state-of-the-art deep RL method Advantage ActorCritic (A2C), training with demos are carried out for both the actor and the critic and reinforcement learning is followed for further improvement.


IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light Control

KDD'18
Highlight: First try on RL signal control. The base of all the methods
[demo] [poster]

In this paper, we propose an effective deep reinforcement learning model for traffic light control and interpreted the policies. We test our method on a large-scale real traffic dataset obtained from surveillance cameras. We also show some interesting case studies of policies learned from the real data.


CityFlow: A Multi-Agent Reinforcement Learning Environment for Large Scale City Traffic Scenario

WWW'19 Demo
Highlight: Simulator
[code] [demo]

CityFlow is an smilator which can support flexible definitions for road network and traffic flow based on synthetic and real-world data. It also provides user-friendly interface for reinforcement learning. Most importantly, CityFlow is more than twenty times faster than SUMO and is capable of supporting city-wide traffic simulation with an interactive render for monitoring. Besides traffic signal control, CityFlow could serve as the base for other transportation studies and can create new possibilities to test machine learning methods in the intelligent transportation domain.


CityFlowER: An Efficient and Realistic Traffic Simulator with Embedded Machine Learning Models

ECML-PKDD'23 Demo
Highlight: Simulator with loadable machine learning models
[code] [demo]

CityFlowER is an extension over the existing CityFlow simulator, designed for efficient and realistic city-wide traffic simulation. CityFlowER pre-embeds ML models within the simulator, eliminating the need for external API interactions and enabling faster data computation.


Learning to Simulate Vehicle Trajectories from Demonstrations

ICDE'20
Highlight: Learning real-world vehicle behavior for a better simulator

Considering the complexity and nonlinearity of the real-world traffic, this paper unprecedentedly treat the problem of traffic simulation as a learning problem, and proposes learning to simulate vehicle trajectory.


Learning to Simulate with Sparse Trajectory Data

ECML-PKDD'20 [Best Applied Data Science Paper Award]
Highlight: Learning to simulate under sparse data

In most real-world cases, the real-world trajectories of agents are sparse, which makes simulation challenging. In this paper, we present a novel framework ImIn-GAIL to address the problem of learning to simulate the driving behavior from sparse real-world data. The proposed architecture incorporates data interpolation with the behavior learning process of imitation learning.


Open Datasets

We provide different traffic datasets, each includes both road network (roadnet.json) and traffic flow file (flow.json), whose formats are defined in Roadnet File Format and Flow File Format respectively.

*All methods are measured in Average Travel Time (in seconds) under CityFlow simulator.
# Dataset name Number of Intersections Time Span (Seconds) Description Referred result* Referred method
1 hangzhou_1x1_bc-tyc_18041607_1h 1 3600 These datasets are based on camera data in Hangzhou. Due to the lack of records about turning vehicles, the turning ratios of each dataset are fixed, with 10% as turning left, 60% as going straight, and 30% as turning right. The turning-right vehicles are discarded since they are not under the control of traffic lights. There are one left-turn lane and one straight lane in each direction in each roadnet. 221.03 SOTL
2 hangzhou_1x1_bc-tyc_18041608_1h 1 3600 334.72 SOTL
3 hangzhou_1x1_bc-tyc_18041610_1h 1 3600 213.20 SOTL
4 hangzhou_1x1_kn-hz_18041607_1h 1 3600 72.48 SOTL
5 hangzhou_1x1_kn-hz_18041608_1h 1 3600 64.10 SOTL
6 hangzhou_1x1_qc-yn_18041607_1h 1 3600 117.24 SOTL
7 hangzhou_1x1_qc-yn_18041608_1h 1 3600 131.99 SOTL
8 hangzhou_1x1_sb-sx_18041607_1h 1 3600 173.85 SOTL
9 hangzhou_1x1_sb-sx_18041608_1h 1 3600 290.00 SOTL
10 hangzhou_1x1_tms-xy_18041607_1h 1 3600 214.77 SOTL
11 hangzhou_1x1_tms-xy_18041608_1h 1 3600 325.32 SOTL
12 syn_1x1_uniform_200_1h 1 3600 These datasets are generated artificially. The vehicles enter the road network uniformly with a fixed entering ratio chosen from 200, 400 and 600 vehicles per hour. 61.44 SOTL
13 syn_1x1_uniform_400_1h 1 3600 133.40 SOTL
14 syn_1x1_uniform_600_1h 1 3600 189.11 SOTL
15 jinan_3x4_hongqi_16XXXXXX_1h 12 3600 The road network contains 12 intersections in a 3x4 grid. Each intersection has four incoming approaches and four outgping approaches, and each approach has three lanes (left-turn, through and right-turn respectively). The traffic flow data is based on camera data in Jinan. Necessary simplification is done due to the low quality of the real-world data.
16 hangzhou_4x4_gudang_18010207_1h 16 3600 The road network contains 16 intersections in a 4x4 grid. Each intersection has four incoming approaches and four outgping approaches, and each approach has three lanes (left-turn, through and right-turn respectively). The traffic flow data is based on camera data in Hangzhou. Necessary simplification is done due to the low quality of the real-world data. • Traffic volume: the traffic volume is derived from camera data at Hangzhou. • Turning ratio: 10% (turning left), 60%(going straight) and 30% (turning right). This is synthesized from the statistics of taxi GPS data. 240.97 MaxPressure
17 syn_1x3_gaussian_500_1h 3 3600 The road network contains 16 intersections in a 4x4 grid. Each intersection has four incoming approaches and four outgping approaches, and each approach has three lanes (left-turn, through and right-turn respectively). • Traffic volume: All the vehicles enter and leave the network from the rim edges.For each entering edge, the number of the vehicles generated is sampled from a Gaussian distribution with mean as 500 vehicles/hour/lane. • Turning ratio: 10% (turning left), 60%(going straight) and 30% (turning right) 422.95 MaxPressure
18 syn_2x2_gaussian_500_1h 4 3600 477.71 MaxPressure
19 syn_3x3_gaussian_500_1h 9 3600 631.75 MaxPressure
20 syn_4x4_gaussian_500_1h 16 3600 689.68 MaxPressure
21 Manhattan_1 2510 3600 The road network contains 2510 intersections in Manhattan, New York. The road network is converted from SUMO default road net into the CityFlow format. • Traffic volume: Vehicles enter and leave the network could appear in every node in the network.For each entering edge, the number of the vehicles generated is sampled from a taxi trajectory data. • Turning ratio: 10% (turning left), 60%(going straight) and 30% (turning right)
22 Manhattan_2 2510 3600
23 Manhattan_3 2510 3600
24 LA_1x4 4 3600 The road network contains 4 intersections in LA.
25 Atlanta_1x5 5 3600 The road network contains 5 intersections in Atlanta.
26 Manhattan_16x3 48 3600 The road network contains 48 intersections in Manhattan.
27 Manhattan_28x7 196 3600 The road network contains 196 intersections in Manhattan.

If you use the datasets in your paper, please cite the following papers:

@article{wei2019survey,
      title={A Survey on Traffic Signal Control Methods},
      author={Wei, Hua and Zheng, Guanjie and Gayah, Vikash and Li, Zhenhui},
      journal={arXiv preprint arXiv:1904.08117},
      year={2019}
    }
    
@inproceedings{wei2019colight,
      title={Colight: Learning network-level cooperation for traffic signal control},
      author={Wei, Hua and Xu, Nan and Zhang, Huichu and Zheng, Guanjie and Zang, Xinshi and Chen, Chacha and Zhang, Weinan and Zhu, Yanmin and Xu, Kai and Li, Zhenhui},
      booktitle={Proceedings of the 28th ACM International Conference on Information and Knowledge Management},
      pages={1913--1922},
      year={2019}
    }
    
@inproceedings{zheng2019frap,
      title={Learning phase competition for traffic signal control},
      author={Zheng, Guanjie and Xiong, Yuanhao and Zang, Xinshi and Feng, Jie and Wei, Hua and Zhang, Huichu and Li, Yong and Xu, Kai and Li, Zhenhui},
      booktitle={Proceedings of the 28th ACM International Conference on Information and Knowledge Management},
      pages={1963--1972},
      year={2019}
    }

Survey

A Survey on traffic signal control

,

Recent Advances in Reinforcement Learning for Traffic Signal Control: A Survey of Models and Evaluation

...

Team

Faculty Members