Long-Context

Accelerating Long-Context Model Training in JAX and XLA

• Large language models (LLMs) are rapidly expanding their context windows, with recent models supporting sequences of 128K tokens, 256K tokens, and beyond. • However, training the