K-TIMARL: Scalable and Rule-Compliant Multi-Agent Reinforcement Learning for Large-Scale Port Vessel Traffic Management

Yiming Zhao1, Yuqi Shi2, Duanfeng Han1, Xiao Peng1, Wangyuan Zhao1, Fenglei Han1*

1Harbin Engineering University | 2The Chinese University of Hong Kong, Shenzhen

Abstract

Increasing maritime traffic has intensified congestion and safety risks in major ports, requiring scalable and rule-compliant autonomous decision-making for multi-vessel operations. Existing centralized coordination methods suffer from excessive communication overhead, while decentralized approaches often struggle to maintain coordination quality and stable compliance with the International Regulations for Preventing Collisions at Sea (COLREGs) and the International Association of Marine Aids to Navigation and Lighthouse Authorities (IALA) buoyage rules. This paper proposes K-TIMARL, a topology-integrated multi-agent reinforcement learning framework for large-scale port vessel traffic management.

Based on a ξ-dependent networked MDP, K-TIMARL restricts coordination to k-hop neighborhoods and integrates per-vessel dynamics models with branched rollouts to improve sample efficiency and reduce communication cost. A graph-attention-based IALA buoy embedding and a hybrid reward function are further introduced to promote channel-aware and COLREGs-compliant maneuvers. The framework is validated in a high-fidelity full-scale Unity3D digital twin of Yantai Port under wind-wave disturbances and heterogeneous vessel dynamics.

In the 20-vessel high-density scenario, K-TIMARL achieves an 88.6% success rate, a 99.3% COLREGs compliance rate, and only 0.05 collision violations per episode, while reducing communication cost by 85.1% compared with centralized baselines.

Watch the Demonstration

Fig. 1. Comparison of communication architectures and the K-TIMARL framework.

Methodology

K-TIMARL Framework Overview

Fig. 2. Illustration of the topological perception and localized modeling framework. (A) Physical interaction topology; (B) Localized modeling architecture; (C) Typical port topologies.

Key Components

Experiments in Yantai Port Digital Twin

Yantai Port Environment

Fig. 5. High-fidelity digital twin of Yantai Port in Unity3D, featuring heterogeneous vessels and IALA buoys.

5-Vessel Trajectory

Fig. 8. Trajectory visualization of the 5-vessel low-density scenario.

20-Vessel High Density

Fig. 24. Step-by-step trajectory visualization of the 20-vessel high-density scenario.

Performance Comparison (Case III: 20 Vessels)

Algorithm Success Rate (%) Collision Rate Comm. Cost (MB)
PPO 76.45 0.22 12.45
MAPPO 81.20 0.14 12.42
K-TIMARL (Ours) 88.60 0.05 1.85

Table 4. K-TIMARL achieves the highest success rate and lowest communication cost in high-density traffic.

Citation

@article{zhao2026k, title={K-TIMARL: Scalable and Rule-Compliant Multi-Agent Reinforcement Learning for Large-Scale Port Vessel Traffic Management}, author={Zhao, Yiming and Shi, Yuqi and Han, Duanfeng and Peng, Xiao and Zhao, Wangyuan and Han, Fenglei}, journal={Journal Name (If applicable)}, year={2026} }