Reliability

A reliability- and latency-driven task allocation framework for workflow applications in the edge-hub-cloud continuum

• Computer Science > Distributed, Parallel, and Cluster Computing [Submitted on 20 Feb 2026] Title:A reliability- and latency-driven task allocation framework for workflow applicat

What to do About AI's Forced Rethink of Reliability in Modern DevOps

• As systems become more distributed and AI-driven, traditional uptime metrics are no longer enough. • The 2026 SRE Report shows how reliability is shifting toward user experience,

Intelligent Networks: Power, Reliability, and Maintenance in Telecom - Webinar Preview

• The upcoming webinar ‘Intelligent Networks: Power, Reliability, and Maintenance in Telecom’ will focus on how telecommunications networks are adapting to growing demands for effi

How Reliable is Your Service at the Extreme Edge? Analytical Modeling of Computational Reliability

• Computer Science > Distributed, Parallel, and Cluster Computing [Submitted on 18 Feb 2026] Title:How Reliable is Your Service at the Extreme Edge? • Analytical Modeling of Comput

Towards a Science of AI Agent Reliability

• Computer Science > Artificial Intelligence [Submitted on 18 Feb 2026] Title:Towards a Science of AI Agent Reliability View PDF HTML (experimental)Abstract:AI agents are increasin

Towards a Science of AI Agent Reliability

• Computer Science > Artificial Intelligence [Submitted on 18 Feb 2026] Title:Towards a Science of AI Agent Reliability View PDF HTML (experimental)Abstract:AI agents are increasin

Improving LLM Reliability through Hybrid Abstention and Adaptive Detection

• Computer Science > Artificial Intelligence [Submitted on 17 Feb 2026] Title:Improving LLM Reliability through Hybrid Abstention and Adaptive Detection View PDF HTML (experimental

Improving LLM Reliability through Hybrid Abstention and Adaptive Detection

• Computer Science > Artificial Intelligence [Submitted on 17 Feb 2026] Title:Improving LLM Reliability through Hybrid Abstention and Adaptive Detection View PDF HTML (experimental

Azure reliability, resiliency, and recoverability: Build continuity by design

• Modern cloud systems are expected to deliver more than uptime. • Customers expect consistent performance, the ability to withstand disruption, and confidence that recovery is pre

What is data reliability (and do you need reliable data)?

• What is data reliability (and do you need reliable data)? • Time to read: What is data reliability (and do you need reliable data)? • There’s a big difference between having data

Hypergrowth isn't always easy

• Tailscale faced shaky uptime during holiday season, prompting transparency. • Public uptime history page offers detailed incident logs and metrics. • Coordination server issues c

Improve service reliability and ops culture with Grafana Cloud Service Center

• Improve service reliability and ops culture with Grafana Cloud Service Center Ryan Kehoe David Ellis Dave Thompson Deyan Halachliyski Today’s engineering organizations are built

Improve service reliability and ops culture with Grafana Cloud Service Center

• Improve service reliability and ops culture with Grafana Cloud Service Center Ryan Kehoe David Ellis Dave Thompson Deyan Halachliyski Today’s engineering organizations are built

Failure is inevitable: Learning from a large outage, and building for reliability in depth at Datadog

• Laura de Vesine Rob Thomas Maciej Kowalewski In March 2023, Datadog experienced a rare, widespread incident that left large parts of our infrastructure only partially functional,