Similar Items: AutoSP: Unlocking Long-Context LLM Training Via Compiler-Based Sequence Parallelism
- Regulating Branch Parallelism in LLM Serving
- HexiSeq: Accommodating Long Context Training of LLMs over Heterogeneous Hardware
- Nitsum: Serving Tiered LLM Requests with Adaptive Tensor Parallelism
- Accelerating Compound LLM Training Workloads with Maestro
- Tackling the Data-Parallel Load Balancing Bottleneck in LLM Serving: Practical Online Routing at Scale
- Piper: Efficient Large-Scale MoE Training via Resource Modeling and Pipelined Hybrid Parallelism