LLM | Tenu Tech Brief

DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference

• Computer Science > Distributed, Parallel, and Cluster Computing [Submitted on 25 Feb 2026] Title:DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference View

MSADM: Large Language Model (LLM) Assisted End-to-End Network Health Management Based on Multi-Scale Semanticization

• Computer Science > Networking and Internet Architecture [Submitted on 12 Jun 2024 (v1), last revised 25 Feb 2026 (this version, v3)] Title:MSADM: Large Language Model (LLM) Assis

Multi-Layer Scheduling for MoE-Based LLM Reasoning

• Computer Science > Distributed, Parallel, and Cluster Computing [Submitted on 25 Feb 2026] Title:Multi-Layer Scheduling for MoE-Based LLM Reasoning View PDF HTML (experimental)Ab

ABD: Default Exception Abduction in Finite First Order Worlds

• ABD benchmark tests default‑exception abduction in finite first‑order logical worlds. • Models generate sparse exception formulas to restore satisfiability under abnormality pred

BiScale: Energy-Efficient Disaggregated LLM Serving via Phase-Aware Placement and DVFS

• Prefill/decode disaggregation improves latency-throughput tradeoff for large language model serving. • Energy consumption remains high; autoscaling is too coarse-grained for rapi

ConfSpec: Efficient Step-Level Speculative Reasoning via Confidence-Gated Verification

• ConfSpec introduces confidence‑gated cascaded verification for step‑level speculative reasoning efficiently. • Small draft models quickly verify reasoning steps, accepting high‑c

Early Evidence of Vibe-Proving with Consumer LLMs: A Case Study on Spectral Region Characterization with ChatGPT-5.2 (Thinking)

• LLMs increasingly used as scientific copilots, but research-level math evidence limited. • Case study uses ChatGPT-5.2 (Thinking) to resolve Conjecture 20 on spectral region of 4

Federated Reasoning Distillation Framework with Model Learnability-Aware Data Allocation

• Addresses bidirectional model learnability gap in federated LLM-SLM reasoning collaboration. • Introduces LaDa framework with learnability-aware data filter for high-reward sampl

Feedback-based Automated Verification in Vibe Coding of CAS Adaptation Built on Constraint Logic

• Leveraged generative LLMs to auto‑generate Adaptation Manager code for CAS systems. • Introduced vibe coding feedback loops to iteratively test and refine generated AMs. • Develo

FineRef: Fine-Grained Error Reflection and Correction for Long-Form Generation with Citations

• FineRef introduces fine-grained error reflection for citation mismatch and irrelevance in long‑form LLM generation. • Two‑stage training: supervised fine‑tuning with attempt‑refl

From 'Help' to Helpful: A Hierarchical Assessment of LLMs in Mental e-Health Applications

• Evaluated 11 LLMs generating six-word subject lines for German counselling emails. • Used hierarchical assessment: first categorize outputs, then rank within categories. • Nine a

LLM-Assisted Replication for Quantitative Social Science

• Replication crisis threatens empirical research credibility, driven by high costs and low incentives for replication. • LLMs accelerate scientific output by automating writing, c

Many AI Analysts, One Dataset: Navigating the Agentic Data Science Multiverse

• AI analysts replicate many‑analyst diversity at scale using large language models. • LLMs and prompt framing generate distinct analytic pipelines on the same dataset. • An AI aud

Prompt Optimization Via Diffusion Language Models

• Diffusion-based framework refines system prompts via masked denoising in an iterative manner. • Conditions on interaction traces: user queries, model responses, and optional feed

ReportLogic: Evaluating Logical Quality in Deep Research Reports

• LLMs increasingly synthesize research into structured reports, but logical reliability remains unassessed. • ReportLogic benchmark quantifies report‑level logical quality for dee

Spilled Energy in Large Language Models

• Reinterprets LLM softmax as Energy-Based Model, enabling energy tracking during decoding. • Introduces training‑free metrics: spilled energy and marginalized energy from logits.

The Convergence of Schema-Guided Dialogue Systems and the Model Context Protocol

• Schema-Guided Dialogue (SGD) and Model Context Protocol (MCP) converge as unified deterministic LLM-agent frameworks. • Both rely on schemas to encode tool signatures, operationa

WANSpec: Leveraging Global Compute Capacity for LLM Inference

• WANSpec leverages under‑utilized global data centers for LLM inference to reduce latency and cost. • Uses speculative decoding by moving draft model to low‑demand GPUs, cutting f

This AI can improve your peer review - and make it more polite

• AI coach transforms peer reviews into more constructive, less toxic feedback. • Stanford researchers trained LLMs on curated reviews flagged as vague or unprofessional. • The Rev

When large language models are reliable for judging empathic communication

• LLMs generate empathic responses, but reliability of judging empathy remains unclear. • Study compares expert, crowdworker, and LLM annotations across four psychological framewor

Beyond a Single Extractor: Re-thinking HTML-to-Text Extraction for LLM Pretraining

• Beyond a Single Extractor: Re-thinking HTML-to-Text Extraction for LLM Pretraining Beyond a Single Extractor: Re-thinking HTML-to-Text Extraction for LLM Pretraining AuthorsJeffr

How Exposed Endpoints Increase Risk Across LLM Infrastructure

• How Exposed Endpoints Increase Risk Across LLM Infrastructure As more organizations run their own Large Language Models (LLMs), they are also deploying more internal services and

AI Coach Improves Peer Review Tone

AI coach provides constructive feedback, turning vague reviews into detailed, actionable suggestions. The tool reduces unprofessional tone, eliminating personal attacks and factual

Agentic Unlearning: When LLM Agent Meets Machine Unlearning

• Computer Science > Machine Learning [Submitted on 6 Feb 2026] Title:Agentic Unlearning: When LLM Agent Meets Machine Unlearning View PDF HTML (experimental)Abstract:In this paper

AI Hallucination from Students' Perspective: A Thematic Analysis

• Students rely on LLMs, hallucinations threaten learning accuracy. • Survey of 63 students revealed common hallucination types: fabricated citations, false facts, overconfidence.

Assessing LLM Response Quality in the Context of Technology-Facilitated Abuse

• Computer Science > Human-Computer Interaction [Submitted on 11 Jan 2026] Title:Assessing LLM Response Quality in the Context of Technology-Facilitated Abuse View PDF HTML (experi

BioBridge: Bridging Proteins and Language for Enhanced Biological Reasoning with LLMs

• BioBridge fuses protein language models with general LLMs to enhance biological reasoning across diverse tasks. • Domain-Incremental Continual Pre‑Training (DICP) injects domain

CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models

• Computer Science > Machine Learning [Submitted on 4 Feb 2026] Title:CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models View PDF HTML (

Microsoft removes guide on how to train LLMs on pirated Harry Potter books

• Microsoft removed blog post that promoted using pirated Harry Potter books to train LLMs. • Post was written by senior product manager Pooja Kamath, advocating dataset for genera

The On-Device LLM Revolution

• Why 3B to 30B models are moving to the edge - and what that means for silicon. • The AI world is experiencing a fundamental shift. • After years of cloud-centric inference domina

A Few-Shot LLM Framework for Extreme Day Classification in Electricity Markets

• Computer Science > Machine Learning [Submitted on 17 Feb 2026] Title:A Few-Shot LLM Framework for Extreme Day Classification in Electricity Markets View PDF HTML (experimental)Ab

AgentLAB: Benchmarking LLM Agents against Long-Horizon Attacks

• Computer Science > Artificial Intelligence [Submitted on 18 Feb 2026] Title:AgentLAB: Benchmarking LLM Agents against Long-Horizon Attacks View PDF HTML (experimental)Abstract:LL

DeepContext: Stateful Real-Time Detection of Multi-Turn Adversarial Intent Drift in LLMs

• DeepContext introduces stateful monitoring for LLM safety, tracking intent across turns. • Uses RNN to process fine‑tuned turn‑level embeddings, preserving conversation context.

Guiding LLM-Based Human Mobility Simulation with Mobility Measures from Shared Data

• M2LSimu introduces a mobility-measures guided framework for LLM-based human mobility simulation. • It coordinates individual agents using shared data, capturing emergent collecti

Mind the GAP: Text Safety Does Not Transfer to Tool-Call Safety in LLM Agents

• Computer Science > Artificial Intelligence [Submitted on 18 Feb 2026] Title:Mind the GAP: Text Safety Does Not Transfer to Tool-Call Safety in LLM Agents View PDF HTML (experimen

Simple Baselines are Competitive with Code Evolution

• Code evolution uses LLMs to mutate code, yet lacks baseline comparisons. • Authors test two simple baselines across math bounds, agentic scaffolds, and ML contests. • Baselines m

[RFC] TensaLang: A tensor-first language for LLM inference, lowering through MLIR to CPU/CUDA

• Hello, I’ve been working on a project called TensaLang and it’s finally at a point worth sharing. • It’s a small language + compiler + runtime for writing LLM forward passes dire

How your LLM is silently hallucinating company revenue

• We’re so glad you’re here. • You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game. • Check

The Perplexity Paradox: Why Code Compresses Better Than Math in LLM Prompts

• Computer Science > Computation and Language [Submitted on 21 Jan 2026] Title:The Perplexity Paradox: Why Code Compresses Better Than Math in LLM Prompts View PDF HTML (experiment

Improving LLM Reliability through Hybrid Abstention and Adaptive Detection

• Computer Science > Artificial Intelligence [Submitted on 17 Feb 2026] Title:Improving LLM Reliability through Hybrid Abstention and Adaptive Detection View PDF HTML (experimental

Improving LLM Reliability through Hybrid Abstention and Adaptive Detection

• Computer Science > Artificial Intelligence [Submitted on 17 Feb 2026] Title:Improving LLM Reliability through Hybrid Abstention and Adaptive Detection View PDF HTML (experimental

Protecting Language Models Against Unauthorized Distillation through Trace Rewriting

• Uses trace rewriting to deter unauthorized knowledge distillation from large language models. • Introduces anti-distillation techniques that degrade training usefulness while kee

Quantifying construct validity in large language model evaluations

• LLM benchmarks often misrepresent true model capabilities due to contamination and annotator errors. • Construct validity is essential to ensure benchmarks truly measure desired

Bruteforcing Accidental Antenna Designs

• Antenna design often seen as black art, but brute-force GPU approach explored. • Janne, novice, used VNA and GPU-based FDTD to simulate and optimize antennas. • Leveraged LLMs to

CrowdStrike's Agentic Security Powered by Human‑AI Feedback Loop

• CrowdStrike’s new Agentic Security framework blends human oversight with AI‑driven threat detection. • The system uses a continuous feedback loop where analysts refine AI models

BotzoneBench: Scalable LLM Evaluation via Graded AI Anchors

• Computer Science > Artificial Intelligence [Submitted on 22 Jan 2026] Title:BotzoneBench: Scalable LLM Evaluation via Graded AI Anchors View PDF HTML (experimental)Abstract:Large

BotzoneBench: Scalable LLM Evaluation via Graded AI Anchors

• Computer Science > Artificial Intelligence [Submitted on 22 Jan 2026] Title:BotzoneBench: Scalable LLM Evaluation via Graded AI Anchors View PDF HTML (experimental)Abstract:Large

TemporalBench: A Benchmark for Evaluating LLM-Based Agents on Contextual and Event-Informed Time Series Tasks

• TemporalBench offers a multi-domain benchmark for temporal reasoning in LLM agents. • Four-tier taxonomy tests historical structure, context-free, contextual, and event-condition

Asynchronous Verified Semantic Caching for Tiered LLM Architectures

• Asynchronous Verified Semantic Caching for Tiered LLM Architectures Asynchronous Verified Semantic Caching for Tiered LLM Architectures AuthorsAsmit Kumar Singh, Haozhe Wang, Lax

Solving Context Size Issues with Docker Model Runner

• Context window limits hinder large language model usage. • Context packing packs multiple messages into single prompt. • Docker Model Runner supports context packing techniques.

Automating Inference Optimizations with NVIDIA TensorRT LLM AutoDeploy

• NVIDIA TensorRT LLM enables developers to build high-performance inference engines for large language models (LLMs), but deploying a new architecture traditionally requires signi

A one-prompt attack that breaks LLM safety alignment

• Share Link copied to clipboard! • Content types Research Topics Actionable threat insights AI and agents Security management Large language models (LLMs) and diffusion models now

LLM Inference Benchmarking - Measure What Matters

• By Piyush Srivastava, Karnik Modi, Stephen Varela, and Rithish Ramesh Production-grade LLM inference is a complex systems challenge, requiring deep co-designs - from hardware pri

Code smells for AI agents: Q&A with Eno Reyes of Factory

• Factory builds autonomous coding agents for large engineering teams, covering full SDLC. • Their platform includes tools to assess code quality and agent impact. • Factory’s agen

Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective

• Agentic RL extends LLM training beyond single-turn responses to full decision-making via environment interaction. • It collects on‑policy data, optimizing policies across multi‑s

Clawdbot with Docker Model Runner, a Private Personal AI Assistant

• Clawdbot + Docker Model Runner enables self-hosted, privacy-first personal AI assistants. • Integrates with Telegram, WhatsApp, Discord, Signal for proactive digital coworker. •

The Most Precious Resource

• Sequoia invests in Kais Khimji as partner, valuing his work ethic, learning appetite, and EQ. • Kais transitions to founder, launching Blockit, an AI-driven time optimization app

The Next Frontier of Runtime Assembly Attacks: Leveraging LLMs to Generate Phishing JavaScript in Real Time

• Attackers embed a benign page that calls an LLM API to generate malicious JavaScript in real time. • Prompt engineering bypasses AI safety guardrails, producing polymorphic phish

Differential Transformer V2

• DiffTransformer V2 doubles query heads, keeps KV heads constant for efficient attention. • Uses differential attention: subtracts paired heads within same GQA group. • Applies si

LLM flexibility, Agent Mode improvements, and new agentic experiences in Android Studio Otter 3 Feature Drop

• Posted by Sandhya Mohan, Senior Product Manager and Trevor Johns, Developer Relations Engineer We are excited to announce that Android Studio Otter 3 Feature Drop is now stable!