Similar Items: EdgeServing: Deadline-Aware Multi-DNN Serving at the Edge
- ClusterLess: Deadline-Aware Serverless Workflow Orchestration on Federated Edge Clusters
- VibeServe: Can AI Agents Build Bespoke LLM Serving Systems?
- Regulating Branch Parallelism in LLM Serving
- Coral: Cost-Efficient Multi-LLM Serving over Heterogeneous Cloud GPUs
- FaaSMoE: A Serverless Framework for Multi-Tenant Mixture-of-Experts Serving
- Nitsum: Serving Tiered LLM Requests with Adaptive Tensor Parallelism