Similar Items: Coral: Cost-Efficient Multi-LLM Serving over Heterogeneous Cloud GPUs
- ROSE: Rollout On Serving GPUs via Cooperative Elasticity for Agentic RL
- Efficient Training on Multiple Consumer GPUs with RoundPipe
- Regulating Branch Parallelism in LLM Serving
- Accelerating MoE with Dynamic In-Switch Computing on Multi-GPUs
- VibeServe: Can AI Agents Build Bespoke LLM Serving Systems?
- EdgeServing: Deadline-Aware Multi-DNN Serving at the Edge