Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
5-2022
Abstract
It is not a trivial problem to collect API-relevant examples, usages, and mentions on venues such as Stack Overflow. It requires efforts to correctly recognize whether the discussion refers to the API method that developers/tools are searching for. The content of the Stack Overflow thread, which consists of both text paragraphs describing the involvement of the API method in the discussion and the code snippets containing the API invocation, may refer to the given API method. Leveraging this observation, we develop ARSeek, a context-specific algorithm to capture the semantic and syntactic information of the paragraphs and code snippets in a discussion. ARSeek combines a syntactic word-based score with a score from a predictive model fine-tuned from CodeBERT. In terms of F1-score, ARSeek achieves an average score of 0.8709 and beats the state-of-the-art approach by 14%.
Keywords
API resource, API embedding, Content classification
Discipline
Databases and Information Systems
Research Areas
Information Systems and Management
Publication
Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension, Pittsburgh, United States, 2022 May 16 - 17
First Page
331
Last Page
342
ISBN
9781450392983
Identifier
10.1145/3524610.3527918
Publisher
IEEE Computer Society
City or Country
Pittsburgh, Pennsylvania
Citation
LUONG, Gia Kien; HADI, Mohammad; Ferdian, Thung; FARD, Fatemeh H.; and LO, David.
ARSeek: identifying API resource using code and discussion on stack overflow. (2022). Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension, Pittsburgh, United States, 2022 May 16 - 17. 331-342.
Available at: https://ink.library.smu.edu.sg/sis_research/7692
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1145/3524610.3527918