Similar Items: Exploring Sparse Matrix Multiplication Kernels on the Cerebras CS-3
- FalconGEMM: Surpassing Hardware Peaks with Lower-Complexity Matrix Multiplication
- MANOJAVAM: A Scalable, Unified FPGA Accelerator for Matrix Multiplication and Singular Value Decomposition in Principal Component Analysis
- AnTi-MiCS: Analytical Framework for Bounding Time in Embedded Mixed-Criticality Systems
- Stochastic Sparse Attention for Memory-Bound Inference
- On Similarity of Computational Kernels in our Codes and Proxies
- SparseRL-Sync: Lossless Weight Synchronization with ~100x Less Communication