Similar Items: KV-Fold: One-Step KV-Cache Recurrence for Long-Context Inference
- Make Your LVLM KV Cache More Lightweight
- SpecKV: Adaptive Speculative Decoding with Compression-Aware Gamma Selection
- QKVShare: Quantized KV-Cache Handoff for Multi-Agent On-Device LLMs
- KV-RM: Regularizing KV-Cache Movement for Static-Graph LLM Serving
- How Long Does Infinite Width Last? Signal Propagation in Long-Range Linear Recurrences
- RcLLM: Accelerating Generative Recommendation via Beyond-Prefix KV Caching