Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

6-2024

Abstract

Generalized category discovery faces a key issue: the lack of supervision for new and unseen data categories. Traditional methods typically combine supervised pretraining with self-supervised learning to create models, and then employ clustering for category identification. However, these approaches tend to become overly tailored to known categories, failing to fully resolve the core issue. Hence, we propose to integrate the feedback from LLMs into an active learning paradigm. Specifically, our method innovatively employs uncertainty propagation to select data samples from high-uncertainty regions, which are then labeled using LLMs through a comparison-based prompting scheme. This not only eases the labeling task but also enhances accuracy in identifying new categories. Additionally, a soft feedback propagation mechanism is introduced to minimize the spread of inaccurate feedback. Experiments on various datasets demonstrate our framework’s efficacy and generalizability, significantly improving baseline models at a nominal average cost.

Keywords

Large language models, LLMs, Category discovery, Natural language processing, Uncertainty propagation

Discipline

Artificial Intelligence and Robotics | Computer Sciences

Research Areas

Data Science and Engineering; Intelligent Systems and Optimization

Areas of Excellence

Digital transformation

Publication

Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2024) : Mexico City, Mexico, Jun 16-21

Volume

First Page

7845

Last Page

7858

Identifier

10.18653/v1/2024.naacl-long.434

Publisher

Association for Computational Linguistics

City or Country

Mexico City, Mexico

Citation

LIANG, Jinggui; LIAO, Lizi; FEI, Hao; LI, Bobo; and JIANG, Jing. Actively learn from LLMs with uncertainty propagation for generalized category discovery. (2024). Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2024) : Mexico City, Mexico, Jun 16-21. 1, 7845-7858.
Available at: https://ink.library.smu.edu.sg/sis_research/9700

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.18653/v1/2024.naacl-long.434

Download

Included in

Artificial Intelligence and Robotics Commons

COinS

Research Collection School Of Computing and Information Systems

Actively learn from LLMs with uncertainty propagation for generalized category discovery

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Areas of Excellence

Publication

Volume

First Page

Last Page

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Actively learn from LLMs with uncertainty propagation for generalized category discovery

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Areas of Excellence

Publication

Volume

First Page

Last Page

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links