Airesearch

BiScale: Energy-Efficient Disaggregated LLM Serving via Phase-Aware Placement and DVFS

• Prefill/decode disaggregation improves latency-throughput tradeoff for large language model serving. • Energy consumption remains high; autoscaling is too coarse-grained for rapi

FineRef: Fine-Grained Error Reflection and Correction for Long-Form Generation with Citations

• FineRef introduces fine-grained error reflection for citation mismatch and irrelevance in long‑form LLM generation. • Two‑stage training: supervised fine‑tuning with attempt‑refl

Quantifying construct validity in large language model evaluations

• LLM benchmarks often misrepresent true model capabilities due to contamination and annotator errors. • Construct validity is essential to ensure benchmarks truly measure desired