Similar Items: Accelerating MoE with Dynamic In-Switch Computing on Multi-GPUs
- MoE-Hub: Taming Software Complexity for Seamless MoE Overlap with Hardware-Accelerated Communication on Multi-GPU Systems
- Relay Buffer Independent Communication over Pooled HBM for Efficient MoE Inference on Ascend
- Piper: Efficient Large-Scale MoE Training via Resource Modeling and Pipelined Hybrid Parallelism
- Coral: Cost-Efficient Multi-LLM Serving over Heterogeneous Cloud GPUs
- Towards Compute-Aware In-Switch Computing for LLMs Tensor-Parallelism on Multi-GPU Systems
- Efficient Training on Multiple Consumer GPUs with RoundPipe