Channels - Microbenchmark-Driven Analytical Performance Modeling Across Modern GPU Architectures :: FRELIP Discovery

Similar Items: Microbenchmark-Driven Analytical Performance Modeling Across Modern GPU Architectures

Quick Look
FusionRCG: Orchestrating Recursive Computation Graphs across GPU Memory Hierarchies
Quick Look
KEET: Explaining Performance of GPU Kernels Using LLM Agents
Quick Look
VDCores: Resource Decoupled Programming and Execution for Asynchronous GPU
Quick Look
PipeMax: Enhancing Offline LLM Inference on Commodity GPU Servers
Quick Look
SAGA: Workflow-Atomic Scheduling for AI Agent Inference on GPU Clusters
Quick Look
MERBIT: A GPU-Based SpMV Method for Iterative Workloads
Quick Look
VUDA: Breaking CUDA-Vulkan Isolation for Spatial Sharing of Compute and Graphics on the Same GPU
Quick Look
GPU-Accelerated Simulations of Problems with Moving Boundaries and Fluid-Structure Interaction at Extreme Scales
Quick Look
Towards Compute-Aware In-Switch Computing for LLMs Tensor-Parallelism on Multi-GPU Systems
Quick Look
Real-Time GPU-Accelerated Monte Carlo Evaluation of Safety-Critical AEB Systems Under Uncertainty
Quick Look
MoE-Hub: Taming Software Complexity for Seamless MoE Overlap with Hardware-Accelerated Communication on Multi-GPU Systems
Quick Look
A Study on the Performance of Distributed Training of Data-driven CFD Simulations
Quick Look
Space Network of Experts: Architecture and Expert Placement
Quick Look
Akita: A High Usability Simulation Framework for Computer Architecture
Quick Look
Joint Temporal-Structural Representation Learning for Distributed Fault Discrimination in Microservice Architectures
Quick Look
AnTi-MiCS: Analytical Framework for Bounding Time in Embedded Mixed-Criticality Systems
Quick Look
Exploring the Efficiency of 3D-Stacked AI Chip Architecture for LLM Inference with Voxel
Quick Look
Decentralized Stratified Sampling for Low-Latency Approximate Geospatial Data Stream Processing in Edge-Cloud Architectures
Quick Look
Accelerating Locality-Driven Integration in Quantum Chemistry with Block-Structured Matrix Multiplication
Quick Look
NeuroRing: Scaling Spiking Neural Networks via Multi-FPGA Bidirectional Ring Topologies and Stream-Dataflow Architectures
Quick Look
From Sensors to Insight: Rapid, Edge-to-Core Application Development for Sensor-Driven Applications
Quick Look
LLM-Emu: Native Runtime Emulation of LLM Inference via Profile-Driven Sampling
Quick Look
(POSTER) From Sensors to Insight: Rapid, Edge-to-Core Application Development for Sensor-Driven Applications
Quick Look
Deadline-Driven Hierarchical Agentic Resource Sharing for AI Services and RAN Functions in AI-RAN