Similar Items: Large Language Models are Universal Reasoners for Visual Generation
- Audio-Visual Intelligence in Large Foundation Models
- Perceptual Flow Network for Visually Grounded Reasoning
- Personal Visual Context Learning in Large Multimodal Models
- Echo-α: Large Agentic Multimodal Reasoning Model for Ultrasound Interpretation
- PhyGround: Benchmarking Physical Reasoning in Generative World Models
- Quantifying the human visual exposome with vision language models