Similar Items: TAVIS: A Benchmark for Egocentric Active Vision and Anticipatory Gaze in Imitation Learning
- GMGaze: MoE-Based Context-Aware Gaze Estimation with CLIP and Multiscale Transformer
- CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models
- Linearizing Vision Transformer with Test-Time Training
- PhyGround: Benchmarking Physical Reasoning in Generative World Models
- Quantifying the human visual exposome with vision language models
- CADBench: A Multimodal Benchmark for AI-Assisted CAD Program Generation