Similar Items: Sieve: Dynamic Expert-Aware PIM Acceleration for Evolving Mixture-of-Experts Models
- AME-PIM: Can Memory be Your Next Tensor Accelerator?
- AxMoE: Characterizing the Impact of Approximate Multipliers on Mixture-of-Experts DNN Architectures
- TokenStack: A Heterogeneous HBM-PIM Architecture and Runtime for Efficient LLM Inference
- HyDRA: Deadline and Reuse-Aware Cacheability for Hardware Accelerators
- HyDRA: Deadline and Reuse-Aware Cacheability for Hardware Accelerators
- VitaLLM: A Versatile, Ultra-Compact Ternary LLM Accelerator with Dependency-Aware Scheduling