Publication Type
Conference Proceeding Article
Version
acceptedVersion
Publication Date
11-2021
Abstract
Online chatrooms are gaining popularity as a communication channel between widely distributed developers of Open Source Software (OSS) projects. Most discussion threads in chatrooms follow a Q&A format, with some developers (askers) raising an initial question and others (respondents) joining in to provide answers. These discussion threads are embedded with rich information that can satisfy the diverse needs of various OSS stakeholders. However, retrieving information from threads is challenging as it requires a thread-level analysis to understand the context. Moreover, the chat data is transient and unstructured, consisting of entangled informal conversations. In this paper, we address this challenge by identifying the information types available in developer chats and further introducing an automated mining technique. Through manual examination of chat data from three chatrooms on Gitter, using card sorting, we build a thread-level taxonomy with nine information categories and create a labeled dataset with 2,959 threads. We propose a classification approach (named F2CHAT) to structure the vast amount of threads based on the information type automatically, helping stakeholders quickly acquire their desired information. F2CHAT effectively combines handcrafted non-textual features with deep textual features extracted by neural models. Specifically, it has two stages with the first one leveraging the siamese architecture to pretrain the textual feature encoder, and the second one facilitating an in-depth fusion of two types of features. Evaluation results suggest that our approach achieves an average F1-score of 0.628, which improves the baseline by 57%. Experiments also verify the effectiveness of our identified non-textual features under both intra-project and cross-project validations
Keywords
Developer Chatrooms, Information Mining, Deep Learning, Gitter
Discipline
Databases and Information Systems | Software Engineering
Research Areas
Software and Cyber-Physical Systems
Publication
2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE): Melbourne' November 14-20: Proceedings
First Page
845
Last Page
866
ISBN
9781665403375
Identifier
10.1109/ASE51524.2021.9678923
Publisher
IEEE
City or Country
Piscataway, NJ
Citation
PAN, Shengyi; BAO, Lingfeng; REN, Xiaoxue; XIA, Xin; LO, David; and LI, Shanping.
Automating developer chat mining. (2021). 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE): Melbourne' November 14-20: Proceedings. 845-866.
Available at: https://ink.library.smu.edu.sg/sis_research/6809
Copyright Owner and License
Authors
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1109/ASE51524.2021.9678923