• How llm-d brings critical resource optimization with SoftBank’s AI-RAN orchestrator Share As the technical reality of AI-RAN comes into focus, many telecommunication service providers are realizing that it’s no longer just about whether they can run AI and radio access network (RAN) on the same hardware - it’s about how they manage AI at scale. • In Red Hat’slatest collaborationwith SoftBank Corp., we have integrated llm-d into SoftBank’s AI-RAN orchestrator, AITRAS. • Founded by Red Hat alongside other industry leaders,llm-dis an open source framework designed to dynamically and intelligently distribute the inferencing of large language models (LLMs) within a RAN more efficiently and with increased performance. • The problem: Unifying AI and RAN workloads at the service provider edge Traditional RAN applications are widely deployed by service providers at the edge on CPUs and GPUs, often utilizing Kubernetes platforms like Red Hat OpenShift. • However, the recent surge in GenAI and transformer-based language models is enabling new forms of computation and insights at the edge. • Now, in addition to traditional RANs, there are AI-powered RAN applications and agents that require runtime and inference end points at the edge.

Article Summaries:

  • Red Hat and SoftBank have integrated the open‑source llm‑d framework into SoftBank’s AI‑RAN orchestrator, AITRAS, to enable efficient large‑language‑model (LLM) inference at telecom edge sites. llm‑d orchestrates vLLM across multiple GPU nodes, allowing RAN and AI workloads to share hardware without compromising performance. The system dynamically assigns GPU resources to LLM prefill and decode phases, optimizes load distribution, and supports autoscaling, thereby reducing operational costs and speeding the deployment of new edge services. The collaboration aims to make AI‑RAN commercially viable by treating AI workloads as cloud‑native functions that coexist with traditional RAN applications.

Sources: