Enhancing Neural Network Training at Yelp: Achieving 1,400x Speedup with WideAndDeep

• Reduced pCTR model training from 75 hours/epoch on 450M samples to under 1 hour on 2B samples. • Built ArrowStreamServer, an in‑house low‑latency Parquet streaming library, replacing inefficient Petastorm. • Switched from TensorFlow MirroredStrategy to Horovod, scaling training across multiple GPUs for speed. • Leveraged TensorFlow, Horovod, Spark, and S3 Parquet to create a unified, distributed training pipeline. • Achieved a 1,400× speedup versus single‑GPU Petastorm baseline, boosting ad‑revenue model turnaround. • Demonstrated how small‑parameter, large‑tabular models can be accelerated with custom data streaming.

Article Summaries:

Yelp’s machine‑learning team has dramatically cut training time for its ad‑revenue models, which use a Wide‑and‑Deep neural network to predict click‑through rates. By moving from TensorFlow’s MirroredStrategy to Horovod for multi‑GPU scaling and replacing Petastorm with an in‑house ArrowStreamServer for low‑latency Parquet data streaming on S3, the team achieved a 1,400‑fold speedup over a single‑GPU baseline. The new pipeline handles billions of tabular samples, reducing per‑epoch training from 75 hours to under one hour, and demonstrates how custom data‑access layers can unlock massive efficiency gains in large‑scale tabular ML workloads.

Sources:

https://engineeringblog.yelp.com/2025/01/enhancing-neural-network-training-at-yelp.html