Similar Items: Context-Aware Autoscaling for Cost-Efficient Large Language Model Inference With Prefix Cache Integration
- RcLLM: Accelerating Generative Recommendation via Beyond-Prefix KV Caching
- DKC-LLM: Dynamic Knowledge Caching for Large Language Models in Business Applications
- A Low-Cost Multi-Objective Cache Prefetcher for Complex and Irregular Memory Access Patterns
- Nominal categorial prefixes in the Boro Part of the Sal languages
- QubitCache: Quantum-Inspired Probabilistic Attention Preservation for KV-Cache Compression
- The addition of temporal neighborhood makes the logic of prefixes and sub-intervals EXPSPACE-complete