• PayPal runs hundreds of thousands of Spark jobs hourly, processing petabytes of data. • Upgrading to Spark 3 and adopting NVIDIA RAPIDS GPUs cuts cloud costs up to 70%. • GPUs offer thousands of parallel cores, ideal for AI and large‑scale data transformations. • PayPal migrated legacy CPU‑based Spark 2 workloads to GPU clusters, tuning RAPIDS parameters for performance. • Challenges included driver compatibility, memory management, and ensuring deterministic results. • The transition enabled faster ML model training, reduced data processing time, and lowered operational spend.

Article Summaries:

  • PayPal has upgraded its massive Apache Spark data‑processing pipeline to Spark 3 and NVIDIA GPUs, reporting up to a 70 % reduction in cloud costs for petabyte‑scale jobs. The company leveraged the open‑source Spark RAPIDS project, which translates Spark workloads into GPU‑friendly code and adds intra‑task parallelism, allowing thousands of GPU cores to accelerate joins, group‑bys and sorts. After migrating from a CPU‑based Spark 2 environment, PayPal tuned RAPIDS parameters and addressed integration challenges, noting significant performance gains and cost savings for its AI and big‑data workloads.

Sources: