Similar Items: HexiSeq: Accommodating Long Context Training of LLMs over Heterogeneous Hardware
- AutoSP: Unlocking Long-Context LLM Training Via Compiler-Based Sequence Parallelism
- AGoQ: Activation and Gradient Quantization for Memory-Efficient Distributed Training of LLMs
- Coral: Cost-Efficient Multi-LLM Serving over Heterogeneous Cloud GPUs
- FedPLT: Scalable, Resource-Efficient, and Heterogeneity-Aware Federated Learning via Partial Layer Training
- FalconGEMM: Surpassing Hardware Peaks with Lower-Complexity Matrix Multiplication
- Towards Compute-Aware In-Switch Computing for LLMs Tensor-Parallelism on Multi-GPU Systems