Similar Items: Taming Request Imbalance: SLO-Aware Scheduling for Disaggregated LLM Inference
- ResiHP: Taming LLM Training Failures with Dynamic Hybrid
- FATE: Future-State-Aware Scheduling for Heterogeneous LLM Workflows
- SplitZip: Ultra Fast Lossless KV Compression for Disaggregated LLM Serving
- Nitsum: Serving Tiered LLM Requests with Adaptive Tensor Parallelism
- Dooly: Configuration-Agnostic, Redundancy-Aware Profiling for LLM Inference Simulation
- SAGA: Workflow-Atomic Scheduling for AI Agent Inference on GPU Clusters