EdgeCLIP: Injecting edge-awareness into visual-language models for zero-shot semantic segmentation
Publication Type
Journal Article
Publication Date
10-2025
Abstract
Effective segmentation of unseen categories in zero-shot semantic segmentation is hindered by models’ limited ability to interpret edges in unfamiliar contexts. In this paper, we propose EdgeCLIP, which addresses this by integrating CLIP with explicit edge-awareness. Based on the premise that edge variation patterns are similar across both seen and unseen class objects, EdgeCLIP introduces the Contextual Edge Sensing module. This module accurately discerns and utilizes edge information, which is crucial in complex border areas where conventional models struggle. Further, our Text-Guided Dense Feature Matching strategy precisely aligns text encodings with corresponding visual edge features, effectively distinguishing them from background edges. This strategy not only optimizes the training of CLIP’s image and text encoders but also leverages the intrinsic completeness of objects, enhancing the model’s ability to generalize and accurately segment objects in unseen classes. EdgeCLIP significantly outperforms the current state-of-the-art method, achieving a deep impressive margin of 17.5% on COCO-20i datasets. Our code is available at github.com/aqingaqinghh/EdgeCLIP.
Discipline
Graphics and Human Computer Interfaces
Research Areas
Intelligent Systems and Optimization
Publication
IEEE Transactions on Circuits and Systems for Video Technology
First Page
1
Last Page
1
ISSN
1051-8215
Identifier
10.1109/TCSVT.2025.3624233
Publisher
Institute of Electrical and Electronics Engineers
Citation
FANG, Jiaxiang; MA, Shiqiang; DUAN, Guihua; GUO, Fei; and HE, Shengfeng.
EdgeCLIP: Injecting edge-awareness into visual-language models for zero-shot semantic segmentation. (2025). IEEE Transactions on Circuits and Systems for Video Technology. 1-1.
Available at: https://ink.library.smu.edu.sg/sis_research/10808
Additional URL
https://doi.org/10.1109/TCSVT.2025.3624233