Publication Type
Journal Article
Version
acceptedVersion
Publication Date
2-2022
Abstract
Past studies have proposed solutions that analyze Stack Overflow content to help users find desired information or aid various downstream software engineering tasks. A common step performed by those solutions is to extract suitable representations of posts; typically, in the form of meaningful vectors. These vectors are then used for different tasks, for example, tag recommendation, relatedness prediction, post classification, and API recommendation. Intuitively, the quality of the vector representations of posts determines the effectiveness of the solutions in performing the respective tasks. In this work, to aid existing studies that analyze Stack Overflow posts, we propose a specialized deep learning architecture Post2Vec which extracts distributed representations of Stack Overflow posts. Post2Vec is aware of different types of content present in Stack Overflow posts, i.e., title, description, and code snippets, and integrates them seamlessly to learn post representations. Tags provided by Stack Overflow users that serve as a common vocabulary that captures the semantics of posts are used to guide Post2Vec in its task. To evaluate the quality of Post2Vec’s deep learning architecture, we first investigate its end-to-end effectiveness in tag recommendation task. The results are compared to those of state-of-the-art tag recommendation approaches that also employ deep neural networks. We observe that Post2Vec achieves 15-25% improvement in terms of F1-score@5 at a lower computational cost. Moreover, to evaluate the value of representations learned by Post2Vec, we use them for three other tasks, i.e., relatedness prediction, post classification, and API recommendation. We demonstrate that the representations can be used to boost the effectiveness of state-of-the-art solutions for the three tasks by substantial margins (by 10%, 7%, and 10% in terms of F1-score, F1-score, and correctness, respectively). We release our replication package at https://github.com/maxxbw/Post2Vec.
Keywords
Deep Learning Artificial Intelligence, Recommender Systems, Software Engineering, Vectors, Distributed Representations
Discipline
Software Engineering
Research Areas
Software and Cyber-Physical Systems
Publication
IEEE Transactions on Software Engineering
Volume
48
Issue
9
First Page
3423
Last Page
3441
ISSN
0098-5589
Identifier
10.1109/TSE.2021.3093761
Publisher
Institute of Electrical and Electronics Engineers
Citation
XU, Bowen; HOANG, Thong; SHARMA, Abhishek; YANG, Chengran; XIA, Xin; and LO, David.
Post2Vec: Learning distributed representations of stack overflow posts. (2022). IEEE Transactions on Software Engineering. 48, (9), 3423-3441.
Available at: https://ink.library.smu.edu.sg/sis_research/7638
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1109/TSE.2021.3093761