Similar Items: Fill the GAP: A Granular Alignment Paradigm for Visual Reasoning in Multimodal Large Language Models
- Large Language Models are Universal Reasoners for Visual Generation
- Personal Visual Context Learning in Large Multimodal Models
- Echo-α: Large Agentic Multimodal Reasoning Model for Ultrasound Interpretation
- UnAC: Adaptive Visual Prompting with Abstraction and Stepwise Checking for Complex Multimodal Reasoning
- Count Anything at Any Granularity
- Perceptual Flow Network for Visually Grounded Reasoning