Similar Items: EvoGround: Self-Evolving Video Agents for Video Temporal Grounding
- Static and Dynamic Graph Alignment Network for Temporal Video Grounding
- Contrastive Learning under Noisy Temporal Self-Supervision for Colonoscopy Videos
- Relit-LiVE: Relight Video by Jointly Learning Environment Video
- CMTA: Leveraging Cross-Modal Temporal Artifacts for Generalizable AI-Generated Video Detection
- Learn where to Click from Yourself: On-Policy Self-Distillation for GUI Grounding
- Perceptual Flow Network for Visually Grounded Reasoning