Diffusing to Coordinate: Efficient Online Multi-Agent Diffusion Policies

• Computer Science > Artificial Intelligence [Submitted on 20 Feb 2026] Title:Diffusing to Coordinate: Efficient Online Multi-Agent Diffusion Policies View PDF HTML (experimental)Abstract:Online Multi-Agent Reinforcement Learning (MARL) is a prominent framework for efficient agent coordination. • Crucially, enhancing policy expressiveness is pivotal for achieving superior performance. • Diffusion-based generative models are well-positioned to meet this demand, having demonstrated remarkable expressiveness and multimodal representation in image generation and offline settings. • Yet, their potential in online MARL remains largely under-explored. • A major obstacle is that the intractable likelihoods of diffusion models impede entropy-based exploration and coordination. • To tackle this challenge, we propose among the first \underline{O}nline off-policy \underline{MA}RL framework using \underline{D}iffusion policies (\textbf{OMAD}) to orchestrate coordination.

Article Summaries:

Summary

A new online multi‑agent reinforcement learning framework, OMAD, introduces diffusion‑based policies for efficient agent coordination. By relaxing the policy objective to maximize scaled joint entropy, the method sidesteps the intractable likelihoods that hinder diffusion models in online settings. Within a centralized training with decentralized execution setup, OMAD employs a joint distributional value function to guide decentralized diffusion policy updates, ensuring stable coordination. Experiments on Multi‑Agent Particle Environments and MuJoCo benchmarks show OMAD outperforms existing approaches, achieving 2.5×-5× better sample efficiency across ten diverse tasks.

Sources:

https://arxiv.org/abs/2602.18291