• Computer Science > Computation and Language [Submitted on 2 Feb 2026] Title:Architecture-Agnostic Curriculum Learning for Document Understanding: Empirical Evidence from Text-Only and Multimodal View PDF HTML (experimental)Abstract:We investigate whether progressive data scheduling – a curriculum learning strategy that incrementally increases training data exposure (33%$\rightarrow$67%$\rightarrow$100%) – yields consistent efficiency gains across architecturally distinct document understanding models • By evaluating BERT (text-only, 110M parameters) and LayoutLMv3 (multimodal, 126M parameters) on the FUNSD and CORD benchmarks, we establish that this schedule reduces wall-clock training time by approximately 33%, commensurate with the reduction from 6 • 0 effective epoch-equivalents of data • To isolate curriculum effects from compute reduction, we introduce matched-compute baselines (Standard-7) that control for total gradient updates • On the FUNSD dataset, the curriculum significantly outperforms the matched-compute baseline for BERT ($\Delta$F1 = +0 • 83$), constituting evidence for a genuine scheduling benefit in capacity-constrained models
Article Summaries:
- Computer Science > Computation and Language [Submitted on 2 Feb 2026] Title:Architecture-Agnostic Curriculum Learning for Document Understanding: Empirical Evidence from Text-Only and Multimodal View PDF HTML (experimental)Abstract:We investigate whether progressive data scheduling – a curriculum learning strategy that incrementally increases training data exposure (33%$\rightarrow$67%$\rightarrow$100%) – yields consistent efficiency gains across architecturally distinct document understanding models. By evaluating BERT (text-only, 110M parameters) and LayoutLMv3 (multimodal, 126M paramet
Sources:
- https://arxiv.org/abs/2602.21225 (Latest source article published: 2026-02-26 05:00 UTC)