Similar Items: BranchySplit: Runtime-Adaptable Partitioning and Early Exits for Accelerated Edge Inference
- DPU or GPU for Accelerating Neural Networks Inference -- Why not both? Split CNN Inference
- One Pool, Two Caches: Adaptive HBM Partitioning for Accelerating Generative Recommender Serving
- Edge Computing-Based Distributed Intrusion Detection Systems via Multi-Hop Split Learning
- LLM-Emu: Native Runtime Emulation of LLM Inference via Profile-Driven Sampling
- Strategic exits in stochastic partnerships: the curse of profitability
- Bidirectional Runtime Enforcement of First-Order Branching-Time Properties