Similar Items: Stochastic Sparse Attention for Memory-Bound Inference
- Exploring Sparse Matrix Multiplication Kernels on the Cerebras CS-3
- SparseRL-Sync: Lossless Weight Synchronization with ~100x Less Communication
- ADELIA: Automatic Differentiation for Efficient Laplace Inference Approximations
- AnTi-MiCS: Analytical Framework for Bounding Time in Embedded Mixed-Criticality Systems
- AGoQ: Activation and Gradient Quantization for Memory-Efficient Distributed Training of LLMs
- PipeMax: Enhancing Offline LLM Inference on Commodity GPU Servers