Similar Items: AnchorD: Metric Grounding of Monocular Depth Using Factor Graphs
- Static and Dynamic Graph Alignment Network for Temporal Video Grounding
- Prompt-Anchored Vision-Text Distillation for Lifelong Person Re-identification
- PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World
- RD-ViT: Recurrent-Depth Vision Transformer for Semantic Segmentation with Reduced Data Dependence Extending the Recurrent-Depth Transformer Architecture to Dense Prediction
- Perceptual Flow Network for Visually Grounded Reasoning
- DPM++: Dynamic Masked Metric Learning for Occluded Person Re-identification