Similar Items: TokenStack: A Heterogeneous HBM-PIM Architecture and Runtime for Efficient LLM Inference
- XtraMAC: An Efficient MAC Architecture for Mixed-Precision LLM Inference on FPGA
- Not All Thoughts Need HBM: Semantics-Aware Memory Hierarchy for LLM Reasoning
- AME-PIM: Can Memory be Your Next Tensor Accelerator?
- AHASD: Asynchronous Heterogeneous Architecture for LLM Adaptive Drafting Speculative Decoding on Mobile Devices
- Understanding Simulated Architecture via gem5 Call-Stack Profiling
- NVLLM: A 3D NAND-Centric Architecture Enabling Edge on-Device LLM Inference