Similar Items: Not All Thoughts Need HBM: Semantics-Aware Memory Hierarchy for LLM Reasoning
- TokenStack: A Heterogeneous HBM-PIM Architecture and Runtime for Efficient LLM Inference
- VitaLLM: A Versatile, Ultra-Compact Ternary LLM Accelerator with Dependency-Aware Scheduling
- VitaLLM: A Versatile and Tiny Accelerator for Mixed-Precision LLM Inference on Edge Devices
- AME-PIM: Can Memory be Your Next Tensor Accelerator?
- LLM-Driven Design Space Exploration of FPGA-based Accelerators
- Effective and Memory-Efficient Alternatives to ECC for Reliable Large-Scale DNNs