Similar Items: ALAM: Algebraically Consistent Latent Transitions for Vision-Language-Action Models
- DINORANKCLIP: DINOv3 Distillation and Injection for Vision-Language Pretraining with High-Order Ranking Consistency
- Seeing Realism from Simulation: Efficient Video Transfer for Vision-Language-Action Data Augmentation
- CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models
- Continuous Latent Diffusion Language Model
- LaST-R1: Reinforcing Action via Adaptive Physical Latent Reasoning for VLA Models
- Quantifying the human visual exposome with vision language models