Similar Items: Accelerating Compound LLM Training Workloads with Maestro
- ZipCCL: Efficient Lossless Data Compression of Communication Collectives for Accelerating LLM Training
- RcLLM: Accelerating Generative Recommendation via Beyond-Prefix KV Caching
- Lifting to tensors when compiling scientific computing workloads for AI Engines
- MERBIT: A GPU-Based SpMV Method for Iterative Workloads
- ResiHP: Taming LLM Training Failures with Dynamic Hybrid
- AutoSP: Unlocking Long-Context LLM Training Via Compiler-Based Sequence Parallelism