Similar Items: Seeing Realism from Simulation: Efficient Video Transfer for Vision-Language-Action Data Augmentation
- Proxy3D: Efficient 3D Representations for Vision-Language Models via Semantic Clustering and Alignment
- Quantifying the human visual exposome with vision language models
- Object Hallucination-Free Reinforcement Unlearning for Vision-Language Models
- StateVLM: A State-Aware Vision-Language Model for Robotic Affordance Reasoning
- PubMed-Ophtha: An open resource for training ophthalmology vision-language models on scientific literature
- When Relations Break: Analyzing Relation Hallucination in Vision-Language Model Under Rotation and Noise