Similar Items: PipeMax: Enhancing Offline LLM Inference on Commodity GPU Servers
- KEET: Explaining Performance of GPU Kernels Using LLM Agents
- SAGA: Workflow-Atomic Scheduling for AI Agent Inference on GPU Clusters
- LLM-Emu: Native Runtime Emulation of LLM Inference via Profile-Driven Sampling
- VDCores: Resource Decoupled Programming and Execution for Asynchronous GPU
- Taming Request Imbalance: SLO-Aware Scheduling for Disaggregated LLM Inference
- Dooly: Configuration-Agnostic, Redundancy-Aware Profiling for LLM Inference Simulation