DPBench: Large Language Models Struggle with Simultaneous Coordination

• Computer Science > Artificial Intelligence [Submitted on 2 Feb 2026] Title:DPBench: Large Language Models Struggle with Simultaneous Coordination View PDF HTML (experimental)Abstract:Large language models are increasingly deployed in multi-agent systems, yet we lack benchmarks that test whether they can coordinate under resource contention. • We introduce DPBench, a benchmark based on the Dining Philosophers problem that evaluates LLM coordination across eight conditions that vary decision timing, group size, and communication. • Our experiments with GPT-5.2, Claude Opus 4.5, and Grok 4.1 reveal a striking asymmetry: LLMs coordinate effectively in sequential settings but fail when decisions must be made simultaneously, with deadlock rates exceeding 95% under some conditions. • We trace this failure to convergent reasoning, where agents independently arrive at identical strategies that, when executed simultaneously, guarantee deadlock. • Contrary to expectations, enabling communication does not resolve this problem and can even increase deadlock rates. • Our findings suggest that multi-agent LLM systems requiring concurrent resource access may need external coordination mechanisms rather than relying on emergent coordination.

Article Summaries:

Computer Science > Artificial Intelligence [Submitted on 2 Feb 2026] Title:DPBench: Large Language Models Struggle with Simultaneous Coordination View PDF HTML (experimental)Abstract:Large language models are increasingly deployed in multi-agent systems, yet we lack benchmarks that test whether they can coordinate under resource contention. We introduce DPBench, a benchmark based on the Dining Philosophers problem that evaluates LLM coordination across eight conditions that vary decision timing, group size, and communication. Our experiments with GPT-5.2, Claude Opus 4.5, and Grok 4.1 reveal a

Sources:

https://arxiv.org/abs/2602.13255