Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
6-2024
Abstract
Generalized category discovery faces a key issue: the lack of supervision for new and unseen data categories. Traditional methods typically combine supervised pretraining with self-supervised learning to create models, and then employ clustering for category identification. However, these approaches tend to become overly tailored to known categories, failing to fully resolve the core issue. Hence, we propose to integrate the feedback from LLMs into an active learning paradigm. Specifically, our method innovatively employs uncertainty propagation to select data samples from high-uncertainty regions, which are then labeled using LLMs through a comparison-based prompting scheme. This not only eases the labeling task but also enhances accuracy in identifying new categories. Additionally, a soft feedback propagation mechanism is introduced to minimize the spread of inaccurate feedback. Experiments on various datasets demonstrate our framework’s efficacy and generalizability, significantly improving baseline models at a nominal average cost.
Keywords
Large language models, LLMs, Category discovery, Natural language processing, Uncertainty propagation
Discipline
Artificial Intelligence and Robotics | Computer Sciences
Research Areas
Data Science and Engineering; Intelligent Systems and Optimization
Areas of Excellence
Digital transformation
Publication
Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2024) : Mexico City, Mexico, Jun 16-21
Volume
1
First Page
7845
Last Page
7858
Identifier
10.18653/v1/2024.naacl-long.434
Publisher
Association for Computational Linguistics
City or Country
Mexico City, Mexico
Citation
LIANG, Jinggui; LIAO, Lizi; FEI, Hao; LI, Bobo; and JIANG, Jing.
Actively learn from LLMs with uncertainty propagation for generalized category discovery. (2024). Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2024) : Mexico City, Mexico, Jun 16-21. 1, 7845-7858.
Available at: https://ink.library.smu.edu.sg/sis_research/9700
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.18653/v1/2024.naacl-long.434