Similar Items: Efficient and Portable Support for Overdecomposition on Distributed Memory GPGPU Platforms
- AGoQ: Activation and Gradient Quantization for Memory-Efficient Distributed Training of LLMs
- Closer in the Gap: Towards Portable Performance on RISC-V Vector Processors
- DisAgg: Distributed Aggregators for Efficient Secure Aggregation in Federated Learning
- Stochastic Sparse Attention for Memory-Bound Inference
- Adaptation of AI-accelerated CFD Simulations to the IPU platform
- FusionRCG: Orchestrating Recursive Computation Graphs across GPU Memory Hierarchies