Spilled Energy in Large Language Models

• Reinterprets LLM softmax as Energy-Based Model, enabling energy tracking during decoding. • Introduces training‑free metrics: spilled energy and marginalized energy from logits. • Energy spills correlate with factual errors, biases, and hallucinations across LLMs. • Method localizes exact answer token without probe classifiers or ablations. • Tested on nine benchmarks, including LLaMA, Mistral, Gemma, Qwen3, showing robust detection. • Works on pretrained and instruction‑tuned models, no extra training overhead.

Article Summaries:

Researchers reinterpret the softmax layer of large language models (LLMs) as an energy‑based model (EBM), enabling the tracking of “energy spills” during decoding. These spills-discrepancies in energy values between consecutive generation steps-correlate with factual errors, biases, and hallucinations. The study introduces two training‑free metrics derived directly from output logits: spilled energy and marginalized energy. Evaluated on nine benchmarks across LLaMA, Mistral, Gemma, and Qwen3, the approach achieves competitive hallucination detection and cross‑task generalization without any additional training overhead.

Sources:

https://arxiv.org/abs/2602.18671