Similar Items: Towards a Large Language-Vision Question Answering Model for MSTAR Automatic Target Recognition
- Enhancing Visual Question Answering with Multimodal LLMs via Chain-of-Question Guided Retrieval-Augmented Generation
- Quantifying the human visual exposome with vision language models
- Object Hallucination-Free Reinforcement Unlearning for Vision-Language Models
- ALAM: Algebraically Consistent Latent Transitions for Vision-Language-Action Models
- StateVLM: A State-Aware Vision-Language Model for Robotic Affordance Reasoning
- Large Language Models are Universal Reasoners for Visual Generation