Similar Items: Enhancing Visual Question Answering with Multimodal LLMs via Chain-of-Question Guided Retrieval-Augmented Generation
- Towards a Large Language-Vision Question Answering Model for MSTAR Automatic Target Recognition
- Towards Highly-Constrained Human Motion Generation with Retrieval-Guided Diffusion Noise Optimization
- Personal Visual Context Learning in Large Multimodal Models
- CacheRAG: A Semantic Caching System for Retrieval-Augmented Generation in Knowledge Graph Question Answering
- UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors
- Confidence-Guided Diffusion Augmentation for Enhanced Bangla Compound Character Recognition