Research Collection School Of Computing and Information Systems

Probabilistic prototype calibration of vision-language models for generalized few-shot semantic segmentation

Jie LIU
Jiayi SHEN
Pan ZHOU, Singapore Management UniversityFollow
Jan-Jakob SONKE
Stratis GAVVES

Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

10-2025

Abstract

Generalized Few-Shot Semantic Segmentation (GFSS) aims to extend a segmentation model to novel classes with only a few annotated examples while maintaining performance on base classes. Recently, pretrained vision-language models (VLMs) such as CLIP have been leveraged in GFSS to improve generalization on novel classes through multi-modal prototypes learning. However, existing prototype-based methods are inherently deterministic, limiting the adaptability of learned prototypes to diverse samples, particularly for novel classes with scarce annotations. To address this, we propose FewCLIP, a probabilistic prototype calibration framework over multi-modal prototypes from the pretrained CLIP, thus providing more adaptive prototype learning for GFSS. Specifically, FewCLIP first introduces a prototype calibration mechanism, which refines frozen textual prototypes with learnable visual calibration prototypes, leading to a more discriminative and adaptive representation. Furthermore, unlike deterministic prototype learning techniques, FewCLIP introduces distribution regularization over these calibration prototypes. This probabilistic formulation ensures structured and uncertainty-aware prototype learning, effectively mitigating overfitting to limited novel class data while enhancing generalization. Extensive experimental results on PASCAL-5i and COCO-20i datasets demonstrate that our proposed FewCLIP significantly outperforms state-of-the-art approaches across both GFSS and class-incremental setting. The code is available at https://github.com/jliu4ai/FewCLIP.

Discipline

Artificial Intelligence and Robotics | Programming Languages and Compilers

Research Areas

Intelligent Systems and Optimization

Areas of Excellence

Digital transformation

Publication

Proceedings of the 2025 International Conference on Computer Vision, ICCV, Honolulu, HawaiI, October 19-23

First Page

Last Page

Identifier

10.48550/arXiv.2506.22979

City or Country

Honolulu, HI, USA

Citation

LIU, Jie; SHEN, Jiayi; ZHOU, Pan; SONKE, Jan-Jakob; and GAVVES, Stratis. Probabilistic prototype calibration of vision-language models for generalized few-shot semantic segmentation. (2025). Proceedings of the 2025 International Conference on Computer Vision, ICCV, Honolulu, HawaiI, October 19-23. 1-16.
Available at: https://ink.library.smu.edu.sg/sis_research/10470

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.48550/arXiv.2506.22979

Download

Included in

Artificial Intelligence and Robotics Commons, Programming Languages and Compilers Commons

COinS

Research Collection School Of Computing and Information Systems

Probabilistic prototype calibration of vision-language models for generalized few-shot semantic segmentation

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Areas of Excellence

Publication

First Page

Last Page

Identifier

City or Country

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Probabilistic prototype calibration of vision-language models for generalized few-shot semantic segmentation

Author

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Areas of Excellence

Publication

First Page

Last Page

Identifier

City or Country

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links