EdgeCLIP: Injecting edge-awareness into visual-language models for zero-shot semantic segmentation

Publication Type

Journal Article

Publication Date

10-2025

Abstract

Effective segmentation of unseen categories in zero-shot semantic segmentation is hindered by models’ limited ability to interpret edges in unfamiliar contexts. In this paper, we propose EdgeCLIP, which addresses this by integrating CLIP with explicit edge-awareness. Based on the premise that edge variation patterns are similar across both seen and unseen class objects, EdgeCLIP introduces the Contextual Edge Sensing module. This module accurately discerns and utilizes edge information, which is crucial in complex border areas where conventional models struggle. Further, our Text-Guided Dense Feature Matching strategy precisely aligns text encodings with corresponding visual edge features, effectively distinguishing them from background edges. This strategy not only optimizes the training of CLIP’s image and text encoders but also leverages the intrinsic completeness of objects, enhancing the model’s ability to generalize and accurately segment objects in unseen classes. EdgeCLIP significantly outperforms the current state-of-the-art method, achieving a deep impressive margin of 17.5% on COCO-20i datasets. Our code is available at github.com/aqingaqinghh/EdgeCLIP.

Discipline

Graphics and Human Computer Interfaces

Research Areas

Intelligent Systems and Optimization

Publication

IEEE Transactions on Circuits and Systems for Video Technology

First Page

1

Last Page

1

ISSN

1051-8215

Identifier

10.1109/TCSVT.2025.3624233

Publisher

Institute of Electrical and Electronics Engineers

Additional URL

https://doi.org/10.1109/TCSVT.2025.3624233

This document is currently not available here.

Share

COinS