Similar Items: Lakestream: A Consistent and Brokerless Data Plane for Large Foundation Model Training
- Piper: Efficient Large-Scale MoE Training via Resource Modeling and Pipelined Hybrid Parallelism
- CCL-D: A High-Precision Diagnostic System for Slow and Hang Anomalies in Large-Scale Model Training
- A Scalable Recipe on SuperMUC-NG Phase 2: Efficient Large-Scale Training of Language Models
- A Study on the Performance of Distributed Training of Data-driven CFD Simulations
- ZipCCL: Efficient Lossless Data Compression of Communication Collectives for Accelerating LLM Training
- Delay-Aware Large-Small Model Collaboration over LEO Satellite Networks