Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

11-2021

Abstract

Online chatrooms are gaining popularity as a communication channel between widely distributed developers of Open Source Software (OSS) projects. Most discussion threads in chatrooms follow a Q&A format, with some developers (askers) raising an initial question and others (respondents) joining in to provide answers. These discussion threads are embedded with rich information that can satisfy the diverse needs of various OSS stakeholders. However, retrieving information from threads is challenging as it requires a thread-level analysis to understand the context. Moreover, the chat data is transient and unstructured, consisting of entangled informal conversations. In this paper, we address this challenge by identifying the information types available in developer chats and further introducing an automated mining technique. Through manual examination of chat data from three chatrooms on Gitter, using card sorting, we build a thread-level taxonomy with nine information categories and create a labeled dataset with 2,959 threads. We propose a classification approach (named F2CHAT) to structure the vast amount of threads based on the information type automatically, helping stakeholders quickly acquire their desired information. F2CHAT effectively combines handcrafted non-textual features with deep textual features extracted by neural models. Specifically, it has two stages with the first one leveraging the siamese architecture to pretrain the textual feature encoder, and the second one facilitating an in-depth fusion of two types of features. Evaluation results suggest that our approach achieves an average F1-score of 0.628, which improves the baseline by 57%. Experiments also verify the effectiveness of our identified non-textual features under both intra-project and cross-project validations

Keywords

Developer Chatrooms, Information Mining, Deep Learning, Gitter

Discipline

Databases and Information Systems | Software Engineering

Research Areas

Software and Cyber-Physical Systems

Publication

2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE): Melbourne' November 14-20: Proceedings

First Page

845

Last Page

866

ISBN

9781665403375

Identifier

10.1109/ASE51524.2021.9678923

Publisher

IEEE

City or Country

Piscataway, NJ

Copyright Owner and License

Authors

Additional URL

https://doi.org/10.1109/ASE51524.2021.9678923

Share

COinS