Protecting Language Models Against Unauthorized Distillation through Trace Rewriting

• Uses trace rewriting to deter unauthorized knowledge distillation from large language models. • Introduces anti-distillation techniques that degrade training usefulness while keeping answers correct. • Embeds API watermark signatures into student models for verifiable ownership. • Employs instruction-based rewriting and gradient methods to modify reasoning traces. • Instruction-based approach improves teacher performance and strongly blocks distillation. • Achieves reliable watermark detection with zero false alarms, preserving semantic coherence.

Article Summaries:

Computer Science > Artificial Intelligence [Submitted on 16 Feb 2026] Title:Protecting Language Models Against Unauthorized Distillation through Trace Rewriting View PDF HTML (experimental)Abstract:Knowledge distillation is a widely adopted technique for transferring capabilities from LLMs to smaller, more efficient student models. However, unauthorized use of knowledge distillation takes unfair advantage of the considerable effort and cost put into developing frontier models. We investigate methods for modifying teacher-generated reasoning traces to achieve two objectives that deter unauthori

Sources:

https://arxiv.org/abs/2602.15143