Similar Items: VIP: Visual-guided Prompt Evolution for Efficient Dense Vision-Language Inference
- Quantifying the human visual exposome with vision language models
- Prompt-Anchored Vision-Text Distillation for Lifelong Person Re-identification
- UnAC: Adaptive Visual Prompting with Abstraction and Stepwise Checking for Complex Multimodal Reasoning
- EmambaIR: Efficient Visual State Space Model for Event-guided Image Reconstruction
- Seeing Realism from Simulation: Efficient Video Transfer for Vision-Language-Action Data Augmentation
- Proxy3D: Efficient 3D Representations for Vision-Language Models via Semantic Clustering and Alignment