A large-scale randomized study of large language model feedback in peer review

• Abstract Peer review is stressed by rapidly rising submission volumes, leading to deteriorating review quality and increased author dissatisfaction. • Here, to address these issues, we developed Review Feedback Agent, a system leveraging multiple large language models to improve review clarity, specificity and actionability by providing automated feedback on vague comments, content misunderstandings and unprofessional remarks to reviewers. • We show, through a randomized controlled study at ICLR 2025 with over 20,000 reviews, that 27% of reviewers who received automated feedback updated their reviews, incorporating over 12,000 suggestions. • This suggests that many reviewers found the artificial intelligence-generated feedback sufficiently helpful to merit updating their reviews. • Blinded evaluation confirmed that revised reviews receiving feedback were more informative. • The intervention led to substantially longer reviews (80 additional words among updaters) and increased engagement during rebuttals, with 6% longer author responses and 5.5% longer reviewer replies.

Article Summaries:

A randomized study at ICLR 2025 tested an automated “Review Feedback Agent” that uses multiple large‑language models to give reviewers suggestions on vague or unprofessional comments. Out of more than 20,000 reviews, 27 % of reviewers who received the AI feedback revised their reports, incorporating over 12,000 suggestions. Blinded evaluation found the revised reviews to be more informative, and the intervention produced longer reviews (≈80 extra words) and modest increases in author and reviewer response length. The open‑source agent demonstrates that well‑designed LLM‑generated feedback can improve review clarity, specificity, and author‑reviewer engagement.
A randomized controlled trial at ICLR 2025 tested “Review Feedback Agent,” a system that uses multiple large‑language models to give reviewers automated feedback on vague or unprofessional comments. Of more than 20,000 reviews, 27 % of reviewers who received the AI‑generated suggestions revised their reports, incorporating over 12,000 edits. Blind assessments found the revised reviews to be more informative, and the intervention produced longer, more detailed reviews (≈80 extra words) and increased author‑reviewer interaction, with author rebuttals and reviewer replies extending by 6 % and 5.5 % respectively. The study demonstrates that well‑designed LLM‑based feedback can enhance peer‑review quality and engagement.

Sources:

https://www.nature.com/articles/s42256-026-01188-x (Latest source article published: 2026-02-25 06:37 UTC)