Similar Items: AGoQ: Activation and Gradient Quantization for Memory-Efficient Distributed Training of LLMs
- HexiSeq: Accommodating Long Context Training of LLMs over Heterogeneous Hardware
- Efficient Training on Multiple Consumer GPUs with RoundPipe
- A Study on the Performance of Distributed Training of Data-driven CFD Simulations
- ZipCCL: Efficient Lossless Data Compression of Communication Collectives for Accelerating LLM Training
- Towards Compute-Aware In-Switch Computing for LLMs Tensor-Parallelism on Multi-GPU Systems
- FedPLT: Scalable, Resource-Efficient, and Heterogeneity-Aware Federated Learning via Partial Layer Training