Similar Items: CR^2: Cost-Aware Risk-Controlled Routing for Wireless Device-Edge LLM Inference
- GELATO: Generative Entropy- and Lyapunov-based Adaptive Token Offloading for Device-Edge Speculative LLM Inference
- VitaLLM: A Versatile and Tiny Accelerator for Mixed-Precision LLM Inference on Edge Devices
- On Capacity and Delay of Wireless Networks with Node Failures
- Diffusion-OAMP for Joint Image Compression and Wireless Transmission
- How Big Should a Wireless Foundation Model Be?
- NVLLM: A 3D NAND-Centric Architecture Enabling Edge on-Device LLM Inference