Speeding Up Variable-Length Training with Dynamic Context Parallelism and NVIDIA Megatron Core

Speeding Up Variable-Length Training with Dynamic Context Parallelism and NVIDIA Megatron Core

• This post introduces Dynamic Context Parallelism (Dynamic-CP), a scheduling approach in NVIDIA Megatron Core used for LLM post-training or DiT pre-training. • It dynamically sele