Similar Items: Characterizing Universal Object Representations Across Vision Models
- Object Hallucination-Free Reinforcement Unlearning for Vision-Language Models
- Proxy3D: Efficient 3D Representations for Vision-Language Models via Semantic Clustering and Alignment
- Quantifying the human visual exposome with vision language models
- ALAM: Algebraically Consistent Latent Transitions for Vision-Language-Action Models
- StateVLM: A State-Aware Vision-Language Model for Robotic Affordance Reasoning
- HEART: Hyperspherical Embedding Alignment via Kent-Representation Traversal in Diffusion Models