Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

11-2025

Abstract

The rapid growth of scientific literature demands efficient methods to organize and synthesize research findings. Existing taxonomy construction methods, leveraging unsupervised clustering or direct prompting of large language models (LLMs), often lack coherence and granularity. We propose a novel context-aware hierarchical taxonomy generation framework that integrates LLM-guided multi-aspect encoding with dynamic clustering. Our method leverages LLMs to identify key aspects of each paper (e.g., methodology, dataset, evaluation) and generates aspect-specific paper summaries, which are then encoded and clustered along each aspect to form a coherent hierarchy. In addition, we introduce a new evaluation benchmark of 156 expert-crafted taxonomies encompassing 11.6k papers, providing the first naturally annotated dataset for this task. Experimental results demonstrate that our method significantly outperforms prior approaches, achieving state-of-the-art performance in taxonomy coherence, granularity, and interpretability.

Discipline

Artificial Intelligence and Robotics | Programming Languages and Compilers

Research Areas

Intelligent Systems and Optimization

Areas of Excellence

Digital transformation

Publication

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, Suzhou, China (EMNLP 2025), November 4-9

First Page

15616

Last Page

15634

Identifier

10.18653/v1/2025.emnlp-main.788

Publisher

ACL

City or Country

China

Additional URL

https://doi.org/10.18653/v1/2025.emnlp-main.788

Share

COinS