Similar Items: DINORANKCLIP: DINOv3 Distillation and Injection for Vision-Language Pretraining with High-Order Ranking Consistency
- Prompt-Anchored Vision-Text Distillation for Lifelong Person Re-identification
- 3D MRI Image Pretraining via Controllable 2D Slice Navigation Task
- D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models
- Quantifying the human visual exposome with vision language models
- Proxy3D: Efficient 3D Representations for Vision-Language Models via Semantic Clustering and Alignment
- Object Hallucination-Free Reinforcement Unlearning for Vision-Language Models