Similar Items: GELATO: Generative Entropy- and Lyapunov-based Adaptive Token Offloading for Device-Edge Speculative LLM Inference
- AHASD: Asynchronous Heterogeneous Architecture for LLM Adaptive Drafting Speculative Decoding on Mobile Devices
- LLM-Enhanced Deep Reinforcement Learning for Task Offloading in Collaborative Edge Computing
- VitaLLM: A Versatile and Tiny Accelerator for Mixed-Precision LLM Inference on Edge Devices
- Entropy-informed Decoding: Adaptive Information-Driven Branching
- NVLLM: A 3D NAND-Centric Architecture Enabling Edge on-Device LLM Inference
- Real-Time Text Transmission via LLM-Based Entropy Coding over Fixed-Rate Channels