Similar Items: AB-Sparse: Sparse Attention with Adaptive Block Size for Accurate and Efficient Long-Context Inference
- Stochastic Sparse Attention for Memory-Bound Inference
- Exploring Sparse Matrix Multiplication Kernels on the Cerebras CS-3
- SparseRL-Sync: Lossless Weight Synchronization with ~100x Less Communication
- ADELIA: Automatic Differentiation for Efficient Laplace Inference Approximations
- HexiSeq: Accommodating Long Context Training of LLMs over Heterogeneous Hardware
- AutoSP: Unlocking Long-Context LLM Training Via Compiler-Based Sequence Parallelism