The Story is Not the Science: Execution-Grounded Evaluation of Mechanistic Interpretability Research
• Reproducibility crisis shows paper‑centric reviews limit research rigor. • AI agents producing research outputs amplify evaluation challenges. • Introduces execution‑grounded eva