Similar Items: Comparing the Performance of Heterogeneous Conjugate Gradient and Cholesky Solvers on Various Hardware Using SYCL
- HexiSeq: Accommodating Long Context Training of LLMs over Heterogeneous Hardware
- Enhancing Performance Insight at Scale: A Heterogeneous Framework for Exascale Diagnostics
- FalconGEMM: Surpassing Hardware Peaks with Lower-Complexity Matrix Multiplication
- FATE: Future-State-Aware Scheduling for Heterogeneous LLM Workflows
- MoE-Hub: Taming Software Complexity for Seamless MoE Overlap with Hardware-Accelerated Communication on Multi-GPU Systems
- Coral: Cost-Efficient Multi-LLM Serving over Heterogeneous Cloud GPUs