CVE-2025-23298: Getting Remote Code Execution in NVIDIA Merlin

• CVE-2025-23298: Getting Remote Code Execution in NVIDIA Merlin While investigating the security posture of various machine learning (ML) and artificial intelligence (AI) frameworks, the Trend Micro Zero Day Initiative (ZDI) Threat Hunting Team discovered a critical vulnerability in the NVIDIA Merlin Transformers4Rec library that could allow an attacker to achieve remote code execution with root privileges. • This vulnerability, tracked asCVE-2025-23298, stems from unsafe deserialization practices in the model checkpoint loading functionality. • What makes this finding particularly interesting is not just the vulnerability itself, but how it highlights the endemic security challenges facing the ML/AI ecosystem’s reliance on Python’s pickle serialization. • In this post, I’ll walk through the discovery process, demonstrate the exploitation technique, analyze the patch, and discuss why this class of vulnerability continues to plague machine learning frameworks despite years of warnings from the security community. • NVIDIA Transformers4Rec NVIDIA Transformers4Rec is part of the Merlin ecosystem, designed to leverage state-of-the-art transformer architectures for sequential and session-based recommendation tasks. • Transformers4Rec acts as a bridge between natural language processing (NLP) and recommender systems (RecSys) by integrating with one of the most popular NLP frameworks,Hugging Face Transformers(HF).

Article Summaries:

Trend Micro’s Zero Day Initiative identified a critical flaw in NVIDIA’s Merlin Transformers4Rec library (CVE‑2025‑23298). The issue lies in the load_model_trainer_states_from_checkpoint function, which uses PyTorch’s torch.load() without safety checks, exposing the library to unsafe Python pickle deserialization. An attacker can supply a crafted checkpoint that triggers arbitrary code execution with root privileges during model loading. The vulnerability underscores persistent security risks in ML/AI frameworks that rely on Python’s pickle for model persistence, despite long‑standing warnings from the security community. NVIDIA has released a patch that sanitizes checkpoint loading to mitigate the risk.

Sources:

https://www.thezdi.com/blog/2025/9/23/cve-2025-23298-getting-remote-code-execution-in-nvidia-merlin