Publication Type
Journal Article
Version
publishedVersion
Publication Date
4-2017
Abstract
During software maintenance and evolution, one of the important tasks faced by developers is to understand a system quickly and accurately. With the increasing size and complexity of an evolving system, program comprehension becomes an increasingly difficult activity. Given a target system for comprehension, developers may first focus on the package comprehension. The packages in the system are of different sizes. For small-sized packages in the system, developers can easily comprehend them. However, for large-sized packages, they are difficult to understand. In this article, we focus on understanding these large-sized packages and propose a novel program comprehension approach for large-sized packages, which utilizes the Latent Dirichlet Allocation (LDA) model to cluster large-sized packages. Thus, these large-sized packages are separated as small-sized clusters, which are easier for developers to comprehend. Empirical studies on four real-world software projects demonstrate the effectiveness of our approach. The results show that the effectiveness of our approach is better than Latent Semantic Indexing- (LSI-) and Probabilistic Latent Semantic Analysis- (PLSA-) based clustering approaches. In addition, we find that the topic that labels each cluster is useful for program comprehension.
Keywords
Based clustering, Empirical studies, Latent dirichlet allocations, Latent Semantic Indexing, Probabilistic latent semantic analysis, Program comprehension, Software maintenance and evolution, Software project
Discipline
Programming Languages and Compilers | Software Engineering
Research Areas
Data Science and Engineering
Publication
Scientific Programming
Volume
2017
First Page
3787053: 1
Last Page
15
ISSN
1058-9244
Identifier
10.1155/2017/3787053
Publisher
IOS Press / Hindawi Publishing Corporation
Citation
SUN, Xiaobing; LIU, Xiangyue; LI, Bin; LI, Bixin; LO, David; and LIAO, Lingzhi.
Clustering classes in packages for program comprehension. (2017). Scientific Programming. 2017, 3787053: 1-15.
Available at: https://ink.library.smu.edu.sg/sis_research/3801
Copyright Owner and License
Authors
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1155/2017/3787053