Similar Items: SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture
- UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors
- G$^2$TR: Generation-Guided Visual Token Reduction for Separate-Encoder Unified Multimodal Models
- HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation
- Images in Sentences: Scaling Interleaved Instructions for Unified Visual Generation
- A Benchmark for Interactive World Models with a Unified Action Generation Framework
- Unified Map Prior Encoder for Mapping and Planning