• Grab tackled slow container cold starts for data platforms like Airflow and Spark Connect. • Implemented Docker image lazy loading via eStargz and Seekable OCI (SOCI). • Fresh node image pulls dropped from minutes to seconds, boosting cold start performance. • SOCI preserves standard startup times (~5s) while eStargz adds overhead (~25s). • Production SOCI deployment cut Airflow and Spark Connect startup times by 30‑40%. • Faster startups improve auto‑scaling responsiveness and reduce resource waste during traffic spikes.

Article Summaries:

  • Grab has deployed Docker image lazy loading to cut container start‑up times for its data platforms. By using eStargz and Seekable OCI (SOCI) snapshotters, the company reduced image pull times on fresh nodes from several minutes to under a minute. In production, Airflow and Spark Connect saw 30‑40 % faster P95 start‑ups, improving auto‑scaling and resource efficiency. SOCI maintained standard application start‑up times, whereas eStargz added overhead. Fine‑tuning SOCI parameters-doubling concurrent downloads and unpacks and increasing chunk size-cut fresh‑node download time from 60 s to 24 s (≈60 % faster). The rollout demonstrates lazy loading as a stable, production‑ready optimization.

Sources: