Similar Items: Robust Vision-Language Alignment Using Multi-Modal Large Language Models for Open-Vocabulary Semantic Segmentation
- Multi‐Grained Vision–Language Alignment for Domain Generalised Person Re‐Identification
- Route Optimization Reimagined: Multi-Modal Large Language Models for Next-Generation Vehicle Routing
- Jailbreaking Vision-Language Models Through the Visual Modality
- Scaling Capability in Token Space: An Analysis of Large Vision Language Model
- Proxy3D: Efficient 3D Representations for Vision-Language Models via Semantic Clustering and Alignment
- Modality-agnostic decoding of vision and language from fMRI