Developing a Multi-Agent System to Generate Next Generation Science Assessments with Evidence-Centered Design

• Multi-Agent System (MAS) integrates Evidence-Centered Design (ECD) to automate NGSS-aligned science assessment creation. • MAS ensembles multiple large language models, each with distinct expertise, to replicate expert-driven item generation workflows. • AI-generated items match human-crafted ones in NGSS three-dimensional alignment and cognitive demand distribution. • AI strengths: higher inclusivity; weaknesses: clarity, conciseness, multimodal design, evidence collectability, student interest. • Findings support scalable, standards-compliant assessment design while highlighting areas needing human oversight. • Future work: refine clarity, multimodality, and evidence collection to fully match expert quality.

Article Summaries:

A study proposes a multi‑agent system (MAS) that integrates Evidence‑Centered Design (ECD) to automatically generate Next Generation Science Standards (NGSS)‑aligned assessment items. The MAS combines several large language models, each embodying different expertise, to replicate the multi‑stage workflow normally performed by human experts. Researchers compared AI‑generated items with human‑developed ones across design dimensions. Results show comparable alignment with NGSS three‑dimensional standards and cognitive demands, with AI items excelling in inclusivity but falling short on clarity, conciseness, and multimodal design. Both AI and human items struggled with evidence collectability and student interest alignment, indicating that while MAS can scale assessment creation, expert oversight remains essential.

Sources:

https://arxiv.org/abs/2602.18451