Structure-Aware Feature Rectification with Region Adjacency Graphs for Training-Free Open-Vocabulary Semantic Segmentation

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Benefiting from the inductive biases learned from large- scale datasets, open-vocabulary semantic segmentation (OVSS) leverages the power of vision-language models, such as CLIP, to achieve remarkable progress without re- quiring task-specific training. However, due to CLIP’s pre- training nature on image-text pairs, it tends to focus on global semantic alignment, resulting in suboptimal perfor- mance when associating fine-grained visual regions with text. This leads to noisy and inconsistent predictions, par- ticularly in local areas. We attribute this to a dispersed bias stemming from its contrastive training paradigm, which is difficult to alleviate using CLIP features alone. To address this, we propose a structure-aware feature rectification ap- proach that incorporates instance-specific priors derived directly from the image. Specifically, we construct a region adjacency graph (RAG) based on low-level features (e.g. colour and texture) to capture local structural relationships and use it to refine CLIP features by enhancing local dis- crimination. Extensive experiments show that our method effectively suppresses segmentation noise, improves region- level consistency, and achieves strong performance on mul- tiple open-vocabulary segmentation benchmarks. Project page: https://qiming-huang.github.io/RAG-OVS/.
Original languageEnglish
Title of host publication2026 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
PublisherIEEE
Publication statusAccepted/In press - 11 Nov 2025
Event2026 IEEE/CVF Winter Conference on Applications of Computer Vision - JW Marriott Starpass, Tucson, United States
Duration: 6 Mar 202610 Mar 2026

Publication series

NameIEEE Workshop on Applications of Computer Vision (WACV)
PublisherIEEE
ISSN (Print)2472-6737
ISSN (Electronic)2642-9381

Conference

Conference2026 IEEE/CVF Winter Conference on Applications of Computer Vision
Abbreviated titleWACV 2026
Country/TerritoryUnited States
CityTucson
Period6/03/2610/03/26

Fingerprint

Dive into the research topics of 'Structure-Aware Feature Rectification with Region Adjacency Graphs for Training-Free Open-Vocabulary Semantic Segmentation'. Together they form a unique fingerprint.

Cite this