Research Collection School Of Computing and Information Systems

Condensing class diagrams with minimal manual labeling cost

Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

6-2016

Abstract

Traditionally, to better understand the design of a project, developers can reconstruct a class diagram from source code using a reverse engineering technique. However, the raw diagram is often perplexing because there are too many classes in it. Condensing the reverse engineered class diagram into a compact class diagram which contains only the important classes would enhance the understandability of the corresponding project. A number of recent works have proposed several supervised machine learning solutions that can be used for condensing reverse engineered class diagrams given a set of classes that are manually labeled as important or not. However, a challenge impacts the practicality of the proposed solutions, which is the expensive cost for manual labeling of training samples. More training samples will lead to better performance, but means higher manual labeling cost. Too much manual labeling will make the problem pointless since the aim is to automatically identify important classes. In this paper, to bridge this research gap, we propose a novel approach MCCondenser which only requires a small amount of training data but can still achieve a reasonably good performance. MCCondenser firstly selects a small proportion of all data, which are the most representative, as training data in an unsupervised way using k-means clustering. Next, it uses ensemble learning to handle the class imbalance problem so that a suitable classifier can be constructed based on the limited training data. To evaluate the performance of MCCondenser, we use datasets from nine open source projects, i.e., ArgoUML, JavaClient, JGAP, JPMC, Mars, Maze, Neuroph, Wro4J and xUML, containing a total of 2640 classes. We compare MCCondenser with two baseline approaches proposed by Thung et al., both of which are state-of-the-art approaches aimed to reduce the manual labeling cost. The experimental results show that MCCondenser can achieve an average AUC score of 0.73, which improves those of the two baselines by nearly 20% and 10% respectively.

Keywords

Class Diagram, Cost Saving, Ensemble Learning, Manual Labeling, Unsupervised Learning

Discipline

Computer Sciences | Software Engineering

Research Areas

Software and Cyber-Physical Systems

Publication

COMPSAC 2016: Proceedings of the 40th IEEE Annual International Computers, Software and Applications Conference, Atlanta, Georgia, 10-14 June 2016

First Page

Last Page

ISBN

9781467388450

Identifier

10.1109/COMPSAC.2016.83

Publisher

IEEE Computer Society

City or Country

Los Alamitos, CA

Citation

YANG, Xinli; David LO; XIA, Xin; and SUN, Jianling. Condensing class diagrams with minimal manual labeling cost. (2016). COMPSAC 2016: Proceedings of the 40th IEEE Annual International Computers, Software and Applications Conference, Atlanta, Georgia, 10-14 June 2016. 22-31.
Available at: https://ink.library.smu.edu.sg/sis_research/3566

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

http://doi.org/10.1109/COMPSAC.2016.83

Download

Find it in your library

Included in

Software Engineering Commons

COinS

Research Collection School Of Computing and Information Systems

Condensing class diagrams with minimal manual labeling cost

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Condensing class diagrams with minimal manual labeling cost

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links