• Computer Science > Networking and Internet Architecture [Submitted on 17 Feb 2026 (v1), last revised 18 Feb 2026 (this version, v2)] Title:AI Sessions for Network-Exposed AI-as-a-Service View PDF HTML (experimental)Abstract:Cloud-based Artificial Intelligence (AI) inference is increasingly latency- and context-sensitive, yet today’s AI-as-a-Service is typically consumed as an application-chosen endpoint, leaving the network to provide only best-effort transport. • This decoupling prevents enforceable tail-latency guarantees, compute-aware admission control, and continuity under mobility. • This paper proposes Network-Exposed AI-as-a-Service (NE-AIaaS) built around a new service primitive: the AI Session (AIS)-a contractual object that binds model identity, execution placement, transport Quality-of-Service (QoS), and consent/charging scope into a single lifecycle with explicit failure semantics. • We introduce the AI Service Profile (ASP), a compact contract that expresses task modality and measurable service objectives (e.g., time-to-first-response/token, p99 latency, success probability) alongside privacy and mobility constraints. • On this basis, we specify protocol-grade procedures for (i) DISCOVER (model/site discovery), (ii) AI PAGING (context-aware selection of execution anchor), (iii) two-phase PREPARE/COMMIT that atomically co-reserves compute and QoS resources, and (iv) make-before-break MIGRATION for session continuity. • The design is standard-mappable to Common API Framewor
Article Summaries:
- The paper introduces Network‑Exposed AI‑as‑a‑Service (NE‑AIaaS), a framework that couples AI model identity, execution location, transport QoS, and billing into a single contractual unit called an AI Session (AIS). It defines an AI Service Profile (ASP) to specify performance targets (e.g., p99 latency, success probability) and privacy/mobility constraints. Protocols for discovery, context‑aware paging, atomic prepare/commit of compute and QoS resources, and make‑before‑break migration are outlined. The design maps to existing standards such as CAPIF, ETSI MEC, 5G QoS flows, and NWDAF analytics, enabling enforceable tail‑latency guarantees and mobility‑aware AI inference.
Sources: