Panini: Continual Learning in Token Space via Structured Memory

• Computer Science > Artificial Intelligence [Submitted on 16 Feb 2026] Title:Panini: Continual Learning in Token Space via Structured Memory View PDF HTML (experimental)Abstract:Language models are increasingly used to reason over content they were not trained on, such as new documents, evolving knowledge, and user-specific data. • A common approach is retrieval-augmented generation (RAG), which stores verbatim documents externally (as chunks) and retrieves only a relevant subset at inference time for an LLM to reason over. • However, this results in inefficient usage of test-time compute (LLM repeatedly reasons over the same documents); moreover, chunk retrieval can inject irrelevant context that increases unsupported generation. • We propose a human-like non-parametric continual learning framework, where the base model remains fixed, and learning occurs by integrating each new experience into an external semantic memory state that accumulates and consolidates itself continually. • We present Panini, which realizes this by representing documents as Generative Semantic Workspaces (GSW) – an entity- and event-aware network of question-answer (QA) pairs, sufficient for an LLM to reconstruct the experienced situations and mine latent knowledge via reasoning-grounded inference chains on the network. • Given a query, Panini only traverses the continually-updated GSW (not the verbatim documents or chunks), and retrieves the most likely inference chains.

Article Summaries:

Computer Science > Artificial Intelligence [Submitted on 16 Feb 2026] Title:Panini: Continual Learning in Token Space via Structured Memory View PDF HTML (experimental)Abstract:Language models are increasingly used to reason over content they were not trained on, such as new documents, evolving knowledge, and user-specific data. A common approach is retrieval-augmented generation (RAG), which stores verbatim documents externally (as chunks) and retrieves only a relevant subset at inference time for an LLM to reason over. However, this results in inefficient usage of test-time compute (LLM repe

Sources:

https://arxiv.org/abs/2602.15156