Publication Type
Conference Proceeding Article
Version
acceptedVersion
Publication Date
10-2025
Abstract
Generalized Few-Shot Semantic Segmentation (GFSS) aims to extend a segmentation model to novel classes with only a few annotated examples while maintaining performance on base classes. Recently, pretrained vision-language models (VLMs) such as CLIP have been leveraged in GFSS to improve generalization on novel classes through multi-modal prototypes learning. However, existing prototype-based methods are inherently deterministic, limiting the adaptability of learned prototypes to diverse samples, particularly for novel classes with scarce annotations. To address this, we propose FewCLIP, a probabilistic prototype calibration framework over multi-modal prototypes from the pretrained CLIP, thus providing more adaptive prototype learning for GFSS. Specifically, FewCLIP first introduces a prototype calibration mechanism, which refines frozen textual prototypes with learnable visual calibration prototypes, leading to a more discriminative and adaptive representation. Furthermore, unlike deterministic prototype learning techniques, FewCLIP introduces distribution regularization over these calibration prototypes. This probabilistic formulation ensures structured and uncertainty-aware prototype learning, effectively mitigating overfitting to limited novel class data while enhancing generalization. Extensive experimental results on PASCAL-5i and COCO-20i datasets demonstrate that our proposed FewCLIP significantly outperforms state-of-the-art approaches across both GFSS and class-incremental setting. The code is available at https://github.com/jliu4ai/FewCLIP.
Discipline
Artificial Intelligence and Robotics | Programming Languages and Compilers
Research Areas
Intelligent Systems and Optimization
Areas of Excellence
Digital transformation
Publication
Proceedings of the 2025 International Conference on Computer Vision, ICCV, Honolulu, HawaiI, October 19-23
First Page
1
Last Page
16
Identifier
10.48550/arXiv.2506.22979
City or Country
Honolulu, HI, USA
Citation
LIU, Jie; SHEN, Jiayi; ZHOU, Pan; SONKE, Jan-Jakob; and GAVVES, Stratis.
Probabilistic prototype calibration of vision-language models for generalized few-shot semantic segmentation. (2025). Proceedings of the 2025 International Conference on Computer Vision, ICCV, Honolulu, HawaiI, October 19-23. 1-16.
Available at: https://ink.library.smu.edu.sg/sis_research/10470
Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.48550/arXiv.2506.22979
Included in
Artificial Intelligence and Robotics Commons, Programming Languages and Compilers Commons