Publication Type
Journal Article
Version
publishedVersion
Publication Date
4-2018
Abstract
Software engineers share experiences with modern technologies using software information sites, such as Stack Overflow. These sites allow developers to label posted content, referred to as software objects, with short descriptions, known as tags. Tags help to improve the organization of questions and simplify the browsing of questions for users. However, tags assigned to objects tend to be noisy and some objects are not well tagged. For instance, 14.7% of the questions that were posted in 2015 on Stack Overflow needed tag re-editing after the initial assignment. To improve the quality of tags in software information sites, we propose EnTagRec (++), which is an advanced version of our prior work EnTagRec. Different from EnTagRec, EnTagRec (++) does not only integrate the historical tag assignments to software objects, but also leverages the information of users, and an initial set of tags that a user may provide for tag recommendation. We evaluate its performance on five software information sites, Stack Overflow, Ask Ubuntu, Ask Different, Super User, and Freecode. We observe that even without considering an initial set of tags that a user provides, it achieves Recall@5 scores of 0.821, 0.822, 0.891, 0.818 and 0.651, and Recall@10 scores of 0.873, 0.886, 0.956, 0.887 and 0.761, on Stack Overflow, Ask Ubuntu, Ask Different, Super User, and Freecode, respectively. In terms of Recall@5 and Recall@10, averaging across the 5 datasets, it improves upon TagCombine, which is the prior state-of-the-art approach, by 29.3% and 14.5% respectively. Moreover, the performance of our approach is further boosted if users provide some initial tags that our approach can leverage to infer additional tags: when an initial set of tags is given, Recall@5 is improved by 10%.
Keywords
Software information sites;Recommendation systems;Tagging
Discipline
Computer and Systems Architecture | Software Engineering
Research Areas
Data Science and Engineering
Publication
Empirical Software Engineering
Volume
23
Issue
2
First Page
800
Last Page
832
ISSN
1382-3256
Identifier
10.1007/s10664-017-9533-1
Publisher
Springer Verlag (Germany)
Citation
WANG, Shawei; LO, David; VASILESCU, Bogdan; and SEREBRENIK, Alexander.
EnTagRec(++): An enhanced tag recommendation system for software information sites. (2018). Empirical Software Engineering. 23, (2), 800-832.
Available at: https://ink.library.smu.edu.sg/sis_research/4127
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1007/s10664-017-9533-1