Similar Items: ROSE: Rollout On Serving GPUs via Cooperative Elasticity for Agentic RL
- Coral: Cost-Efficient Multi-LLM Serving over Heterogeneous Cloud GPUs
- VibeServe: Can AI Agents Build Bespoke LLM Serving Systems?
- Efficient Training on Multiple Consumer GPUs with RoundPipe
- Accelerating MoE with Dynamic In-Switch Computing on Multi-GPUs
- Irminsul: MLA-Native Position-Independent Caching for Agentic LLM Serving
- EdgeServing: Deadline-Aware Multi-DNN Serving at the Edge