Similar Items: H-ViT: hardware-friendly post-training quantization for efficient vision transformer inference
- QS4D: Quantization‐Aware Training for Efficient Hardware Deployment of Structured State‐Space Sequential Models
- Let ViT Speak: Generative Language-Image Pre-training
- ViM-Q: Scalable Algorithm-Hardware Co-Design for Vision Mamba Model Inference on FPGA
- RD-ViT: Recurrent-Depth Vision Transformer for Semantic Segmentation with Reduced Data Dependence Extending the Recurrent-Depth Transformer Architecture to Dense Prediction
- AT‐ViT: Area‐Targeted Multi‐View Vision Transformer With Cross‐Attention and Multi‐Scale Patching for Plant Trait Recognition in Herbarium Images
- Mobile3ViT: An Improved Hybrid CNN‐Visual Transformer Model for Automatic Gastrointestinal Image Recognition