Similar Items: Microbenchmark-Driven Analytical Performance Modeling Across Modern GPU Architectures
- FusionRCG: Orchestrating Recursive Computation Graphs across GPU Memory Hierarchies
- KEET: Explaining Performance of GPU Kernels Using LLM Agents
- VDCores: Resource Decoupled Programming and Execution for Asynchronous GPU
- PipeMax: Enhancing Offline LLM Inference on Commodity GPU Servers
- SAGA: Workflow-Atomic Scheduling for AI Agent Inference on GPU Clusters
- MERBIT: A GPU-Based SpMV Method for Iterative Workloads