The Perplexity Paradox: Why Code Compresses Better Than Math in LLM Prompts

• Computer Science > Computation and Language [Submitted on 21 Jan 2026] Title:The Perplexity Paradox: Why Code Compresses Better Than Math in LLM Prompts View PDF HTML (experimental)Abstract:In “Compress or Route?” (Johnson, 2026), we found that code generation tolerates aggressive prompt compression (r >= 0.6) while chain-of-thought reasoning degrades gradually. • That study was limited to HumanEval (164 problems), left the “perplexity paradox” mechanism unvalidated, and provided no adaptive algorithm. • This paper addresses all three gaps. • First, we validate across six code benchmarks (HumanEval, MBPP, HumanEval+, MultiPL-E) and four reasoning benchmarks (GSM8K, MATH, ARC-Challenge, MMLU-STEM), confirming the compression threshold generalizes across languages and difficulties. • Second, we conduct the first per-token perplexity analysis (n=723 tokens), revealing a “perplexity paradox”: code syntax tokens are preserved (high perplexity) while numerical values in math problems are pruned despite being task-critical (low perplexity). • Signature injection recovers +34 percentage points in pass rate (5.3% to 39.3%; Cohen’s h=0.890).

Article Summaries:

A recent study on large‑language‑model (LLM) prompt compression shows that code generation tolerates aggressive compression while mathematical reasoning suffers. Researchers validated the “perplexity paradox” across six coding benchmarks (HumanEval, MBPP, HumanEval+, MultiPL‑E) and four reasoning tasks (GSM8K, MATH, ARC‑Challenge, MMLU‑STEM). Per‑token perplexity analysis revealed that code syntax tokens remain high‑perplexity (preserved) whereas numeric values in math problems, though critical, are pruned. Adding a signature injection technique raised code pass rates from 5.3 % to 39.3 %. The proposed Task‑Aware Adaptive Compression (TAAC) algorithm cuts cost by 22 % while keeping 96 % of quality, outperforming fixed‑ratio compression by 7 %.

Sources:

https://arxiv.org/abs/2602.15843