Similar Items: A Scalable Recipe on SuperMUC-NG Phase 2: Efficient Large-Scale Training of Language Models
- Piper: Efficient Large-Scale MoE Training via Resource Modeling and Pipelined Hybrid Parallelism
- FedPLT: Scalable, Resource-Efficient, and Heterogeneity-Aware Federated Learning via Partial Layer Training
- CCL-D: A High-Precision Diagnostic System for Slow and Hang Anomalies in Large-Scale Model Training
- Efficient Training on Multiple Consumer GPUs with RoundPipe
- AGoQ: Activation and Gradient Quantization for Memory-Efficient Distributed Training of LLMs
- Implementing True MPI Sessions and Evaluating MPI Initialization Scalability