Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

6-2014

Abstract

Nowadays, software developers often discuss the usage of various APIs in online forums. Automatically assigning pre-defined semantic categorizes to API discussions in these forums could help manage the data in online forums, and assist developers to search for useful information. We refer to this process as content categorization of API discussions. To solve this problem, Hou and Mo proposed the usage of naive Bayes multinomial, which is an effective classification algorithm. In this paper, we propose a Cache-bAsed compoSitE algorithm, short formed as CASE, to automatically categorize API discussions. Considering that the content of an API discussion contains both textual description and source code, CASE has 3 components that analyze an API discussion in 3 different ways: text, code, and original. In the text component, CASE only considers the textual description; in the code component, CASE only considers the source code; in the original component, CASE considers the original content of an API discussion which might include textual description and source code. Next, for each component, since different terms (i.e., words) have different affinities to different categories, CASE caches a subset of terms which have the highest affinity scores to each category, and builds a classifier based on the cached terms. Finally, CASE combines all the 3 classifiers to achieve a better accuracy score. We evaluate the performance of CASE on 3 datasets which contain a total of 1,035 API discussions. The experiment results show that CASE achieves accuracy scores of 0.69, 0.77, and 0.96 for the 3 datasets respectively, which outperforms the state-of-the-art method proposed by Hou and Mo by 11%, 10%, and 2%, respectively.

Keywords

API Discussion, Text Categorization, Composite Method, CacheBased Method

Discipline

Software Engineering

Research Areas

Software and Cyber-Physical Systems

Publication

22nd International Conference on Program Comprehension (ICPC 2014): Proceedings: June 2-3, 2014, Hyderabad, India

First Page

Last Page

105

ISBN

9781450328791

Identifier

10.1145/2597008.2597142

Publisher

ACM

City or Country

New York

Citation

Zhou, Bo; Xia, Xin; LO, David; Tian, Cong; and Wang, Xinyu. Towards more accurate content categorization of API discussions. (2014). 22nd International Conference on Program Comprehension (ICPC 2014): Proceedings: June 2-3, 2014, Hyderabad, India. 95-105.
Available at: https://ink.library.smu.edu.sg/sis_research/2420

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

http://dx.doi.org/10.1145/2597008.2597142

Download

Included in

Software Engineering Commons

COinS

Research Collection School Of Computing and Information Systems

Towards more accurate content categorization of API discussions

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Towards more accurate content categorization of API discussions

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links