Channels - SAGA: Workflow-Atomic Scheduling for AI Agent Inference on GPU Clusters :: FRELIP Discovery

Similar Items: SAGA: Workflow-Atomic Scheduling for AI Agent Inference on GPU Clusters

Quick Look
PipeMax: Enhancing Offline LLM Inference on Commodity GPU Servers
Quick Look
FATE: Future-State-Aware Scheduling for Heterogeneous LLM Workflows
Quick Look
KEET: Explaining Performance of GPU Kernels Using LLM Agents
Quick Look
ClusterLess: Deadline-Aware Serverless Workflow Orchestration on Federated Edge Clusters
Quick Look
Taming Request Imbalance: SLO-Aware Scheduling for Disaggregated LLM Inference
Quick Look
VDCores: Resource Decoupled Programming and Execution for Asynchronous GPU
Quick Look
Microbenchmark-Driven Analytical Performance Modeling Across Modern GPU Architectures
Quick Look
MERBIT: A GPU-Based SpMV Method for Iterative Workloads
Quick Look
VUDA: Breaking CUDA-Vulkan Isolation for Spatial Sharing of Compute and Graphics on the Same GPU
Quick Look
GPU-Accelerated Simulations of Problems with Moving Boundaries and Fluid-Structure Interaction at Extreme Scales
Quick Look
Towards Compute-Aware In-Switch Computing for LLMs Tensor-Parallelism on Multi-GPU Systems
Quick Look
Token Arena: A Continuous Benchmark Unifying Energy and Cognition in AI Inference
Quick Look
Real-Time GPU-Accelerated Monte Carlo Evaluation of Safety-Critical AEB Systems Under Uncertainty
Quick Look
Deadline-Driven Hierarchical Agentic Resource Sharing for AI Services and RAN Functions in AI-RAN
Quick Look
Replication in Graph Partitioning and Scheduling Problems
Quick Look
Exploring the Efficiency of 3D-Stacked AI Chip Architecture for LLM Inference with Voxel
Quick Look
AI Inference as Relocatable Electricity Demand: A Latency-Constrained Energy-Geography Framework
Quick Look
VibeServe: Can AI Agents Build Bespoke LLM Serving Systems?
Quick Look
MoE-Hub: Taming Software Complexity for Seamless MoE Overlap with Hardware-Accelerated Communication on Multi-GPU Systems
Quick Look
Affinity Tailor: Dynamic Locality-Aware Scheduling at Scale
Quick Look
A Semantic Quantum Circuit Cache for Scalable and Distributed Quantum-Classical Workflows
Quick Look
Stochastic Sparse Attention for Memory-Bound Inference
Quick Look
ADELIA: Automatic Differentiation for Efficient Laplace Inference Approximations
Quick Look
Dooly: Configuration-Agnostic, Redundancy-Aware Profiling for LLM Inference Simulation