Similar Items: DPU or GPU for Accelerating Neural Networks Inference -- Why not both? Split CNN Inference
- VitaLLM: A Versatile and Tiny Accelerator for Mixed-Precision LLM Inference on Edge Devices
- Design Conductor 2.0: An agent builds a TurboQuant inference accelerator in 80 hours
- Efficient, VRAM-Constrained xLM Inference on Clients
- Sparse-on-Dense: Area and Energy-Efficient Computing of Sparse Neural Networks on Dense Matrix Multiplication Accelerators
- CuLifter: Lifting GPU Binaries to Typed IR
- Silicon Showdown: Performance, Efficiency, and Ecosystem Barriers in Consumer-Grade LLM Inference