GPU-Resident Gaussian Process Regression Leveraging Asynchronous Tasks with HPX

• GPRat library extended to a fully GPU-resident Gaussian Process prediction pipeline. • Combines HPX task‑based parallelism with an intuitive Python API for seamless integration. • Employs tiled algorithms and optimized CUDA libraries to accelerate linear algebra operations. • GPU implementation outperforms CPU baseline for datasets over 128 samples, achieving significant speedups. • Cholesky decomposition and GP prediction see up to 4.3× and 4.6× speedups, respectively. • HPX with multiple CUDA streams surpasses cuSOLVER by up to 11% on large datasets.

Article Summaries:

A new GPU‑resident implementation of Gaussian Process Regression (GPR) has been released by the GPRat team. Extending their HPX‑based library, the authors built a fully GPU‑resident prediction pipeline that uses tiled algorithms and optimized CUDA libraries to accelerate linear‑algebra operations. Experiments show that for datasets with more than 128 training points, the GPU version delivers up to 4.3× speedup on Cholesky decomposition and 4.6× on full GP prediction. By combining HPX task parallelism with multiple CUDA streams, the method even outperforms cuSOLVER by up to 11 % on large problems, demonstrating a scalable, high‑performance alternative to CPU‑only solvers.

Sources:

https://arxiv.org/abs/2602.19683