Channels - TokenStack: A Heterogeneous HBM-PIM Architecture and Runtime for Efficient LLM Inference :: FRELIP Discovery

Similar Items: TokenStack: A Heterogeneous HBM-PIM Architecture and Runtime for Efficient LLM Inference

Quick Look
XtraMAC: An Efficient MAC Architecture for Mixed-Precision LLM Inference on FPGA
Quick Look
Not All Thoughts Need HBM: Semantics-Aware Memory Hierarchy for LLM Reasoning
Quick Look
AME-PIM: Can Memory be Your Next Tensor Accelerator?
Quick Look
AHASD: Asynchronous Heterogeneous Architecture for LLM Adaptive Drafting Speculative Decoding on Mobile Devices
Quick Look
Understanding Simulated Architecture via gem5 Call-Stack Profiling
Quick Look
NVLLM: A 3D NAND-Centric Architecture Enabling Edge on-Device LLM Inference
Quick Look
Silicon Showdown: Performance, Efficiency, and Ecosystem Barriers in Consumer-Grade LLM Inference
Quick Look
VitaLLM: A Versatile and Tiny Accelerator for Mixed-Precision LLM Inference on Edge Devices
Quick Look
31.1 A 14.08-to-135.69Token/s ReRAM-on-Logic Stacked Outlier-Free Large-Language-Model Accelerator with Block-Clustered Weight-Compression and Adaptive Parallel-Speculative-Decoding
Quick Look
Efficient, VRAM-Constrained xLM Inference on Clients
Quick Look
DPU or GPU for Accelerating Neural Networks Inference -- Why not both? Split CNN Inference
Quick Look
VitaLLM: A Versatile, Ultra-Compact Ternary LLM Accelerator with Dependency-Aware Scheduling
Quick Look
A Protocol-Independent Transport Architecture
Quick Look
Exploring the Efficiency of 3D-Stacked AI Chip Architecture for LLM Inference with Voxel
Quick Look
DSPE: An Energy-Efficient Edge Processor for DeepSeek Inference with MerkleTree-based Incremental Pruning, Multi-Stage Boothing Lookup and Dynamic Adaptive Posit Processing
Quick Look
No Tile Left Behind: Multiprogramming for Surface-Code Architectures
Quick Look
LLM-Driven Design Space Exploration of FPGA-based Accelerators
Quick Look
UVMarvel: an Automated LLM-aided UVM Machine for Subsystem-level RTL Verification
Quick Look
A Reconfigurable Multiplier Architecture for Error-Resilient Applications in RISC-V Core
Quick Look
SafeTune: Mitigating Data Poisoning in LLM Fine-Tuning for RTL Code Generation
Quick Look
RCW-CIM: A Digital CIM-based LLM Accelerator with Read-Compute/Write
Quick Look
AxMoE: Characterizing the Impact of Approximate Multipliers on Mixture-of-Experts DNN Architectures
Quick Look
Single 32-bit Sub-Channel DDR5 DIMMs: Architecture, Performance Bounds, and Standardisation
Quick Look
ViM-Q: Scalable Algorithm-Hardware Co-Design for Vision Mamba Model Inference on FPGA