• Reusing Pre-Training Data at Test Time is a Compute Multiplier Reusing Pre-Training Data at Test Time is a Compute Multiplier AuthorsAlex Fangâ  , Thomas Voice, Ruoming Pang, Ludwig Schmidtâ  , Tom Gunter** View publication Copy Bibtex Large language models learn from their vast pre-training corpora, gaining the ability to solve an ever increasing variety of tasks; yet although researchers work to improve these datasets, there is little effort to understand how efficient the pre-training apparatus is at extracting ideas and knowledge from the data • In this work, we use retrieval augmented generation along with test-time compute as a way to quantify how much dataset value was left behind by the process of pre-training, and how this changes across scale • We demonstrate that pre-training then retrieving from standard and largely open-sourced datasets results in significant accuracy gains in MMLU, Math-500, and SimpleQA, which persist through decontamination • For MMLU we observe that retrieval acts as a ~5x compute multiplier versus pre-training alone • We show that these results can be further improved by leveraging additional compute at test time to parse the retrieved context, de

Article Summaries:

  • Reusing Pre-Training Data at Test Time is a Compute Multiplier AuthorsAlex Fangâ , Thomas Voice, Ruoming Pang, Ludwig Schmidtâ , Tom Gunter** Reusing Pre-Training Data at Test Time is a Compute Multiplier AuthorsAlex Fangâ , Thomas Voice, Ruoming Pang, Ludwig Schmidtâ , Tom Gunter** Large language models learn from their vast pre-training corpora, gaining the ability to solve an ever increasing variety of tasks; yet although researchers work to improve these datasets, there is little effort to understand how efficient the pre-training apparatus is at extracting ideas and knowledge from

Sources: