BiScale: Energy-Efficient Disaggregated LLM Serving via Phase-Aware Placement and DVFS
• Prefill/decode disaggregation improves latency-throughput tradeoff for large language model serving. • Energy consumption remains high; autoscaling is too coarse-grained for rapi