Similar Items: SEMIR: Semantic Minor-Induced Representation Learning on Graphs for Visual Segmentation
- FoR-Net: Learning to Focus on Hard Regions for Efficient Semantic Segmentation
- Representation Fréchet Loss for Visual Generation
- Beyond the Last Layer: Multi-Layer Representation Fusion for Visual Tokenizatio
- Proxy3D: Efficient 3D Representations for Vision-Language Models via Semantic Clustering and Alignment
- RD-ViT: Recurrent-Depth Vision Transformer for Semantic Segmentation with Reduced Data Dependence Extending the Recurrent-Depth Transformer Architecture to Dense Prediction
- Personal Visual Context Learning in Large Multimodal Models