Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

6-2024

Abstract

Generalized category discovery faces a key issue: the lack of supervision for new and unseen data categories. Traditional methods typically combine supervised pretraining with self-supervised learning to create models, and then employ clustering for category identification. However, these approaches tend to become overly tailored to known categories, failing to fully resolve the core issue. Hence, we propose to integrate the feedback from LLMs into an active learning paradigm. Specifically, our method innovatively employs uncertainty propagation to select data samples from high-uncertainty regions, which are then labeled using LLMs through a comparison-based prompting scheme. This not only eases the labeling task but also enhances accuracy in identifying new categories. Additionally, a soft feedback propagation mechanism is introduced to minimize the spread of inaccurate feedback. Experiments on various datasets demonstrate our framework’s efficacy and generalizability, significantly improving baseline models at a nominal average cost.

Keywords

Large language models, LLMs, Category discovery, Natural language processing, Uncertainty propagation

Discipline

Artificial Intelligence and Robotics | Computer Sciences

Research Areas

Data Science and Engineering; Intelligent Systems and Optimization

Areas of Excellence

Digital transformation

Publication

Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2024) : Mexico City, Mexico, Jun 16-21

Volume

1

First Page

7845

Last Page

7858

Identifier

10.18653/v1/2024.naacl-long.434

Publisher

Association for Computational Linguistics

City or Country

Mexico City, Mexico

Additional URL

https://doi.org/10.18653/v1/2024.naacl-long.434

Share

COinS