Similar Items: NVLLM: A 3D NAND-Centric Architecture Enabling Edge on-Device LLM Inference
- VitaLLM: A Versatile and Tiny Accelerator for Mixed-Precision LLM Inference on Edge Devices
- MCFlash: Bulk Bitwise Processing in 3D NAND with Dynamic Sensing and Multi-level Encoding
- XtraMAC: An Efficient MAC Architecture for Mixed-Precision LLM Inference on FPGA
- AHASD: Asynchronous Heterogeneous Architecture for LLM Adaptive Drafting Speculative Decoding on Mobile Devices
- TokenStack: A Heterogeneous HBM-PIM Architecture and Runtime for Efficient LLM Inference
- AMMA: A Multi-Chiplet Memory-Centric Architecture for Low-Latency 1M Context Attention Serving