Similar Items: FATE: Future-State-Aware Scheduling for Heterogeneous LLM Workflows
- Taming Request Imbalance: SLO-Aware Scheduling for Disaggregated LLM Inference
- SAGA: Workflow-Atomic Scheduling for AI Agent Inference on GPU Clusters
- Affinity Tailor: Dynamic Locality-Aware Scheduling at Scale
- ClusterLess: Deadline-Aware Serverless Workflow Orchestration on Federated Edge Clusters
- Coral: Cost-Efficient Multi-LLM Serving over Heterogeneous Cloud GPUs
- Dooly: Configuration-Agnostic, Redundancy-Aware Profiling for LLM Inference Simulation