Similar Items: AMMA: A Multi-Chiplet Memory-Centric Architecture for Low-Latency 1M Context Attention Serving
- NVLLM: A 3D NAND-Centric Architecture Enabling Edge on-Device LLM Inference
- Non-Monotonic Latency in Apple MPS Decoding: KV Cache Interactions and Execution Regimes
- Evolution of NVENC Efficiency: A Longitudinal Analysis of HQ and UHQ Tuning Efficiency, Latency and Energy Trade-offs
- A Protocol-Independent Transport Architecture
- No Tile Left Behind: Multiprogramming for Surface-Code Architectures
- AME-PIM: Can Memory be Your Next Tensor Accelerator?