• FineRef introduces fine-grained error reflection for citation mismatch and irrelevance in long‑form LLM generation. • Two‑stage training: supervised fine‑tuning with attempt‑reflect‑correct pattern and online bootstrapping. • Process‑level reinforcement learning with multi‑dimensional rewards boosts reflection accuracy, answer quality, and correction gain. • FineRef’s 7B model outperforms GPT‑4 by 18% Citation F1 and 4% EM Recall on the ALCE benchmark. • Demonstrates strong generalization and robustness across domain transfer and noisy retrieval scenarios. • Provides a scalable framework for trustworthy, citation‑aware long‑form content generation.

Article Summaries:

  • FineRef is a new framework for improving long‑form text generation with citations. It tackles two common citation errors-mismatch and irrelevance-by training models to self‑identify and correct them on a per‑citation basis. The approach uses a two‑stage training pipeline: first, supervised fine‑tuning teaches an “attempt‑reflect‑correct” pattern with lightweight reflection models; second, reinforcement learning refines this behavior with a multi‑dimensional reward that balances reflection accuracy, answer quality, and correction gain. On the ALCE benchmark, a 7B FineRef model outperforms GPT‑4 by up to 18 % in Citation F1 and 4 % in EM Recall, and shows strong robustness in noisy retrieval and domain‑transfer scenarios.

Sources: