• INSURE‑Dial is the first public benchmark for compliance‑aware voice agents in insurance calls. • Corpus contains 50 real AI‑initiated calls and 1,000 synthetic calls, averaging 71 turns each. • Calls annotated with phase‑structured JSON covering IVR, ID, coverage, meds, agent ID. • Two evaluation tasks: Phase Boundary Detection and Compliance Verification using span‑based logic. • Baselines perform well per phase, but end‑to‑end accuracy suffers from span‑boundary errors. • Real‑world exact segmentation remains low, highlighting gap between fluency and audit‑grade evidence. • Dataset addresses $1 trillion annual administrative cost in U.S. healthcare. • Provides resources on Hugging Face, DagsHub, and code for reproducibility.
Article Summaries:
- INSURE‑Dial is the first public benchmark for building compliance‑aware voice agents that audit insurance‑benefit verification calls. The dataset contains 50 de‑identified, AI‑initiated calls with live representatives (≈71 turns each) and 1,000 synthetic calls that replicate the same workflow. Every call is annotated with a phase‑structured JSON schema covering IVR navigation, patient identification, coverage status, medication checks, and agent identification, and each phase is labeled for Information and Procedural compliance. The benchmark defines two tasks: (1) Phase Boundary Detection, and (2) Compliance Verification. While small, low‑latency baselines score well per phase, full‑call exact segmentation remains low, highlighting a gap between conversational fluency and audit‑grade evidence.
Sources: