Channels - RD-ViT: Recurrent-Depth Vision Transformer for Semantic Segmentation with Reduced Data Dependence Extending the Recurrent-Depth Transformer Architecture to Dense Prediction

Similar Items: RD-ViT: Recurrent-Depth Vision Transformer for Semantic Segmentation with Reduced Data Dependence Extending the Recurrent-Depth Transformer Architecture to Dense Prediction

Quick Look
Let ViT Speak: Generative Language-Image Pre-training
Quick Look
Linearizing Vision Transformer with Test-Time Training
Quick Look
H-ViT: hardware-friendly post-training quantization for efficient vision transformer inference
Quick Look
Faithful Extreme Image Rescaling with Learnable Reversible Transformation and Semantic Priors
Quick Look
AnchorD: Metric Grounding of Monocular Depth Using Factor Graphs
Quick Look
Foundation AI Models for Aerosol Optical Depth Estimation from PACE Satellite Data
Quick Look
FoR-Net: Learning to Focus on Hard Regions for Efficient Semantic Segmentation
Quick Look
AT‐ViT: Area‐Targeted Multi‐View Vision Transformer With Cross‐Attention and Multi‐Scale Patching for Plant Trait Recognition in Herbarium Images
Quick Look
Shrub-depth: Capturing Height of Dense Graphs
Quick Look
Edge-Efficient Image Restoration: Transformer Distillation into State-Space Models
Quick Look
MAGNUS: Multi-Attention Guided Network for Unified Segmentation via CNN-ViT Fusion
Quick Look
UniCorrn: Unified Correspondence Transformer Across 2D and 3D
Quick Look
Mobile3ViT: An Improved Hybrid CNN‐Visual Transformer Model for Automatic Gastrointestinal Image Recognition
Quick Look
LoViF 2026 The First Challenge on Holistic Quality Assessment for 4D World Model (PhyScore)
Quick Look
GMGaze: MoE-Based Context-Aware Gaze Estimation with CLIP and Multiscale Transformer
Quick Look
Quantifying the human visual exposome with vision language models
Quick Look
FlowDIS: Language-Guided Dichotomous Image Segmentation with Flow Matching
Quick Look
Prompt-Anchored Vision-Text Distillation for Lifelong Person Re-identification
Quick Look
UHR-Net: An Uncertainty-Aware Hypergraph Refinement Network for Medical Image Segmentation
Quick Look
Distinguishing Gait Patterns in PD Patients Under Different Treatments via Recurrence Plots and Vision Transformer Fusion
Quick Look
Map2World: Segment Map Conditioned Text to 3D World Generation
Quick Look
StateVLM: A State-Aware Vision-Language Model for Robotic Affordance Reasoning
Quick Look
In-depth review of atmospheric mercury: sources, transformations, and potential sinks
Quick Look
SIAM: Head and Brain MRI Segmentation from Few High-Quality Templates via Synthetic Training