Similar Items: Relay Buffer Independent Communication over Pooled HBM for Efficient MoE Inference on Ascend
- MoE-Hub: Taming Software Complexity for Seamless MoE Overlap with Hardware-Accelerated Communication on Multi-GPU Systems
- Accelerating MoE with Dynamic In-Switch Computing on Multi-GPUs
- Piper: Efficient Large-Scale MoE Training via Resource Modeling and Pipelined Hybrid Parallelism
- Irminsul: MLA-Native Position-Independent Caching for Agentic LLM Serving
- Characterizing Path-Independent Fees: A Route to Zero Impermanent Loss in CPMMs
- GMGaze: MoE-Based Context-Aware Gaze Estimation with CLIP and Multiscale Transformer