Similar Items: UnAC: Adaptive Visual Prompting with Abstraction and Stepwise Checking for Complex Multimodal Reasoning
- Perceptual Flow Network for Visually Grounded Reasoning
- Echo-α: Large Agentic Multimodal Reasoning Model for Ultrasound Interpretation
- Large Language Models are Universal Reasoners for Visual Generation
- Enhancing Visual Question Answering with Multimodal LLMs via Chain-of-Question Guided Retrieval-Augmented Generation
- Prompt-Anchored Vision-Text Distillation for Lifelong Person Re-identification
- LaST-R1: Reinforcing Action via Adaptive Physical Latent Reasoning for VLA Models