Publication Type
Conference Proceeding Article
Version
acceptedVersion
Publication Date
11-2015
Abstract
Many tools that automatically analyze, summarize, or transform software artifacts rely on natural language processing tooling for the interpretation of natural language text produced by software developers, such as documentation, code comments, commit messages, or bug reports. Processing natural language text produced by software developers is challenging because of unique characteristics not found in other texts, such as the presence of code terms and the systematic use of incomplete sentences. In addition, texts produced by Portuguese-speaking developers mix languages since many keywords and programming concepts are referred to by their English name. In this paper, we provide empirical insights into the challenges of analyzing software artifacts written in Portuguese. We analyzed 100 question titles from the Portuguese version of Stack Overflow with two Portuguese language tools and identified multiple problems which resulted in very few sentences being tagged completely correctly. Based on these results, we propose heuristics to improve the analysis of natural language text produced by software developers in Portuguese.
Keywords
Documentation, natural language processing
Discipline
Programming Languages and Compilers | Software Engineering
Research Areas
Software and Cyber-Physical Systems
Publication
Proceedings of the 2015 29th Brazilian Symposium on Software Engineering, Belo Horizonte, Brazil, September 21-26
First Page
179
Last Page
184
ISBN
9781467392723
Identifier
10.1109/SBES.2015.27
Publisher
IEEE
City or Country
Piscataway, NJ
Citation
TREUDE, Christoph; PROLO, Carlos A.; and FIGUEIRA FILHO, Fernando.
Challenges in analyzing software documentation in Portuguese. (2015). Proceedings of the 2015 29th Brazilian Symposium on Software Engineering, Belo Horizonte, Brazil, September 21-26. 179-184.
Available at: https://ink.library.smu.edu.sg/sis_research/8943
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1109/SBES.2015.27