Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
5-2021
Abstract
Primitive types are fundamental components available in any programming language, which serve as the building blocks of data manipulation. Understanding the role of these types in source code is essential to write software. Little work has been conducted on how often these variables are documented in code comments and what types of knowledge the comments provide about variables of primitive types. In this paper, we present an approach for detecting primitive variables and their description in comments using lexical matching and advanced matching. We evaluate our approaches by comparing the lexical and advanced matching performance in terms of recall, precision, and F-score, against 600 manually annotated variables from a sample of GitHub projects. The performance of our advanced approach based on F-score was superior compared to lexical matching, 0.986 and 0.942, respectively. We then create a taxonomy of the types of knowledge contained in these comments about variables of primitive types. Our study showed that developers usually documented the variables' identifiers of a numeric data type with their purpose (69.16%) and concept (72.75%) more than the variables' identifiers of type String which were less documented with purpose (61.14%) and concept (55.46%). Our findings characterise the current state of the practice of documenting primitive variables and point at areas that are often not well documented, such as the meaning of boolean variables or the purpose of fields and local variables.
Keywords
Documentation, Knowledge, Source code comments, Variables
Discipline
Software Engineering
Research Areas
Software and Cyber-Physical Systems
Publication
Proceedings of the 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), Virtual Conference, May 17-19
First Page
460
Last Page
470
ISBN
9781728187105
Identifier
10.1109/MSR52588.2021.00058
Publisher
IEEE
City or Country
Piscataway, NJ
Citation
ALGHAMDI, Mahfouth; HAYASHI, Shinpei; KOBAYASHI, Takashi; and TREUDE, Christoph.
Characterising the knowledge about primitive variables in java code comments. (2021). Proceedings of the 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), Virtual Conference, May 17-19. 460-470.
Available at: https://ink.library.smu.edu.sg/sis_research/8849
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1109/MSR52588.2021.00058