Similar Items: Contrastive Learning under Noisy Temporal Self-Supervision for Colonoscopy Videos
- EvoGround: Self-Evolving Video Agents for Video Temporal Grounding
- Learning Coarse-to-Fine Osteoarthritis Representations under Noisy Hierarchical Labels
- Static and Dynamic Graph Alignment Network for Temporal Video Grounding
- Action Motifs: Self-Supervised Hierarchical Representation of Human Body Movements
- CMTA: Leveraging Cross-Modal Temporal Artifacts for Generalizable AI-Generated Video Detection
- Relit-LiVE: Relight Video by Jointly Learning Environment Video