Similar Items: Images in Sentences: Scaling Interleaved Instructions for Unified Visual Generation
- G$^2$TR: Generation-Guided Visual Token Reduction for Separate-Encoder Unified Multimodal Models
- SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture
- Representation Fréchet Loss for Visual Generation
- 123D: Unifying Multi-Modal Autonomous Driving Data at Scale
- A Benchmark for Interactive World Models with a Unified Action Generation Framework
- A unified Benchmark for Multi-Frame Image Restoration under Severe Refractive Warping