Similar Items: Linearizing Vision Transformer with Test-Time Training
- PubMed-Ophtha: An open resource for training ophthalmology vision-language models on scientific literature
- RD-ViT: Recurrent-Depth Vision Transformer for Semantic Segmentation with Reduced Data Dependence Extending the Recurrent-Depth Transformer Architecture to Dense Prediction
- Rethinking Dense Optical Flow without Test-Time Scaling
- Quantifying the human visual exposome with vision language models
- Object Hallucination-Free Reinforcement Unlearning for Vision-Language Models
- Prompt-Anchored Vision-Text Distillation for Lifelong Person Re-identification