Similar Items: NCCLZ: Compression-Enabled GPU Collectives with Decoupled Quantization and Entropy Coding
- VDCores: Resource Decoupled Programming and Execution for Asynchronous GPU
- ZipCCL: Efficient Lossless Data Compression of Communication Collectives for Accelerating LLM Training
- AGoQ: Activation and Gradient Quantization for Memory-Efficient Distributed Training of LLMs
- KEET: Explaining Performance of GPU Kernels Using LLM Agents
- PipeMax: Enhancing Offline LLM Inference on Commodity GPU Servers
- SAGA: Workflow-Atomic Scheduling for AI Agent Inference on GPU Clusters