Similar Items: SAGA: Workflow-Atomic Scheduling for AI Agent Inference on GPU Clusters
- PipeMax: Enhancing Offline LLM Inference on Commodity GPU Servers
- FATE: Future-State-Aware Scheduling for Heterogeneous LLM Workflows
- KEET: Explaining Performance of GPU Kernels Using LLM Agents
- ClusterLess: Deadline-Aware Serverless Workflow Orchestration on Federated Edge Clusters
- Taming Request Imbalance: SLO-Aware Scheduling for Disaggregated LLM Inference
- VDCores: Resource Decoupled Programming and Execution for Asynchronous GPU