Similar Items: Make Your LVLM KV Cache More Lightweight
- SpecKV: Adaptive Speculative Decoding with Compression-Aware Gamma Selection
- Where's the Plan? Locating Latent Planning in Language Models with Lightweight Mechanistic Interventions
- Beyond Pairs: Your Language Model is Secretly Optimizing a Preference Graph
- LiVeAction: a Lightweight, Versatile, and Asymmetric Neural Codec Design for Real-time Operation
- Don't Get Your Kroneckers in a Twist: Gaussian Processes on High-Dimensional Incomplete Grids
- QKVShare: Quantized KV-Cache Handoff for Multi-Agent On-Device LLMs