Similar Items: Lifting to tensors when compiling scientific computing workloads for AI Engines
- Towards Compute-Aware In-Switch Computing for LLMs Tensor-Parallelism on Multi-GPU Systems
- Nitsum: Serving Tiered LLM Requests with Adaptive Tensor Parallelism
- AutoSP: Unlocking Long-Context LLM Training Via Compiler-Based Sequence Parallelism
- Akita: A High Usability Simulation Framework for Computer Architecture
- Accelerating MoE with Dynamic In-Switch Computing on Multi-GPUs
- Adaptation of AI-accelerated CFD Simulations to the IPU platform