Similar Items: Efficient Training on Multiple Consumer GPUs with RoundPipe
- Coral: Cost-Efficient Multi-LLM Serving over Heterogeneous Cloud GPUs
- Accelerating MoE with Dynamic In-Switch Computing on Multi-GPUs
- ROSE: Rollout On Serving GPUs via Cooperative Elasticity for Agentic RL
- PipeMax: Enhancing Offline LLM Inference on Commodity GPU Servers
- AGoQ: Activation and Gradient Quantization for Memory-Efficient Distributed Training of LLMs
- ZipCCL: Efficient Lossless Data Compression of Communication Collectives for Accelerating LLM Training