Similar Items: Personal Visual Context Learning in Large Multimodal Models
- Audio-Visual Intelligence in Large Foundation Models
- Echo-α: Large Agentic Multimodal Reasoning Model for Ultrasound Interpretation
- Large Language Models are Universal Reasoners for Visual Generation
- UnAC: Adaptive Visual Prompting with Abstraction and Stepwise Checking for Complex Multimodal Reasoning
- Enhancing Visual Question Answering with Multimodal LLMs via Chain-of-Question Guided Retrieval-Augmented Generation
- Multimodal Learning on Low-Quality Data with Conformal Predictive Self-Calibration