Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

5-2023

Abstract

Stop words, which are considered non-predictive, are often eliminated in natural language processing tasks. However, the definition of uninformative vocabulary is vague, so most algorithms use general knowledge-based stop lists to remove stop words. There is an ongoing debate among academics about the usefulness of stop word elimination, especially in domainspecific settings. In this work, we investigate the usefulness of stop word removal in a software engineering context. To do this, we replicate and experiment with three software engineering research tools from related work. Additionally, we construct a corpus of software engineering domain-related text from 10,000 Stack Overflow questions and identify 200 domain-specific stop words using traditional information-theoretic methods. Our results show that the use of domain-specific stop words significantly improved the performance of research tools compared to the use of a general stop list and that 17 out of 19 evaluation measures showed better performance.

Keywords

Natural Language Processing (NLP), Software Engineering Documents, Stop Words

Discipline

Software Engineering

Research Areas

Software and Cyber-Physical Systems

Publication

Proceedings of the 2nd Workshop on Natural Language-based Software Engineering, 2023 May 20

First Page

Last Page

ISBN

9798350301786

Identifier

10.1109/NLBSE59153.2023.00016

Publisher

IEEE

City or Country

Los Alamitos, CA

Citation

FAN, Yaohou; ARORA, Chetan; and TREUDE, Christoph. Stop words for processing software engineering documents: Do they matter. (2023). Proceedings of the 2nd Workshop on Natural Language-based Software Engineering, 2023 May 20. 40-47.
Available at: https://ink.library.smu.edu.sg/sis_research/8912

Copyright Owner and License

Authors

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1109/NLBSE59153.2023.00016z

Download

Included in

Software Engineering Commons

COinS

Research Collection School Of Computing and Information Systems

Stop words for processing software engineering documents: Do they matter

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Stop words for processing software engineering documents: Do they matter

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links