Similar Items: Efficient, VRAM-Constrained xLM Inference on Clients
- Silicon Showdown: Performance, Efficiency, and Ecosystem Barriers in Consumer-Grade LLM Inference
- XtraMAC: An Efficient MAC Architecture for Mixed-Precision LLM Inference on FPGA
- TokenStack: A Heterogeneous HBM-PIM Architecture and Runtime for Efficient LLM Inference
- DPU or GPU for Accelerating Neural Networks Inference -- Why not both? Split CNN Inference
- EMiX: Emulating Beyond Single-FPGA Limits
- DSPE: An Energy-Efficient Edge Processor for DeepSeek Inference with MerkleTree-based Incremental Pruning, Multi-Stage Boothing Lookup and Dynamic Adaptive Posit Processing