Similar Items: Towards Compute-Aware In-Switch Computing for LLMs Tensor-Parallelism on Multi-GPU Systems
- Accelerating MoE with Dynamic In-Switch Computing on Multi-GPUs
- Lifting to tensors when compiling scientific computing workloads for AI Engines
- Nitsum: Serving Tiered LLM Requests with Adaptive Tensor Parallelism
- VUDA: Breaking CUDA-Vulkan Isolation for Spatial Sharing of Compute and Graphics on the Same GPU
- MoE-Hub: Taming Software Complexity for Seamless MoE Overlap with Hardware-Accelerated Communication on Multi-GPU Systems
- VDCores: Resource Decoupled Programming and Execution for Asynchronous GPU