Similar Items: Beyond the Last Layer: Multi-Layer Representation Fusion for Visual Tokenizatio
- Representation Fréchet Loss for Visual Generation
- LaST-R1: Reinforcing Action via Adaptive Physical Latent Reasoning for VLA Models
- Perceptual Flow Network for Visually Grounded Reasoning
- Audio-Visual Intelligence in Large Foundation Models
- Action Motifs: Self-Supervised Hierarchical Representation of Human Body Movements
- Learning Coarse-to-Fine Osteoarthritis Representations under Noisy Hierarchical Labels