Similar Items: One Token Per Frame: Reconsidering Visual Bandwidth in World Models for VLA Policy
- LaST-R1: Reinforcing Action via Adaptive Physical Latent Reasoning for VLA Models
- Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling
- A unified Benchmark for Multi-Frame Image Restoration under Severe Refractive Warping
- Representation Fréchet Loss for Visual Generation
- Map2World: Segment Map Conditioned Text to 3D World Generation
- Perceptual Flow Network for Visually Grounded Reasoning