Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
11-2025
Abstract
The rapid growth of scientific literature demands efficient methods to organize and synthesize research findings. Existing taxonomy construction methods, leveraging unsupervised clustering or direct prompting of large language models (LLMs), often lack coherence and granularity. We propose a novel context-aware hierarchical taxonomy generation framework that integrates LLM-guided multi-aspect encoding with dynamic clustering. Our method leverages LLMs to identify key aspects of each paper (e.g., methodology, dataset, evaluation) and generates aspect-specific paper summaries, which are then encoded and clustered along each aspect to form a coherent hierarchy. In addition, we introduce a new evaluation benchmark of 156 expert-crafted taxonomies encompassing 11.6k papers, providing the first naturally annotated dataset for this task. Experimental results demonstrate that our method significantly outperforms prior approaches, achieving state-of-the-art performance in taxonomy coherence, granularity, and interpretability.
Discipline
Artificial Intelligence and Robotics | Programming Languages and Compilers
Research Areas
Intelligent Systems and Optimization
Areas of Excellence
Digital transformation
Publication
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, Suzhou, China (EMNLP 2025), November 4-9
First Page
15616
Last Page
15634
Identifier
10.18653/v1/2025.emnlp-main.788
Publisher
ACL
City or Country
China
Citation
ZHU, Kun; LIAO, Lizi; GU, Yuxuan; HUANG, Lei; FENG, Xiaocheng; and QIN, Bing.
Context-aware hierarchical taxonomy generation for scientific papers via LLM-guided multi-aspect clustering. (2025). Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, Suzhou, China (EMNLP 2025), November 4-9. 15616-15634.
Available at: https://ink.library.smu.edu.sg/sis_research/10754
Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.18653/v1/2025.emnlp-main.788
Included in
Artificial Intelligence and Robotics Commons, Programming Languages and Compilers Commons