Research Collection School Of Computing and Information Systems

Automated Construction of a Software-Specific Word Similarity Database

Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

2-2014

Abstract

Many automated software engineering approaches, including code search, bug report categorization, and duplicate bug report detection, measure similarities between two documents by analyzing natural language contents. Often different words are used to express the same meaning and thus measuring similarities using exact matching of words is insufficient. To solve this problem, past studies have shown the need to measure the similarities between pairs of words. To meet this need, the natural language processing community has built WordNet which is a manually constructed lexical database that records semantic relations among words and can be used to measure how similar two words are. However, WordNet is a general purpose resource, and often does not contain software-specific words. Also, the meanings of words in WordNet are often different than when they are used in software engineering context. Thus, there is a need for a software-specific WordNet-like resource that can measure similarities of words. In this work, we propose an automated approach that builds a software-specific WordNet like resource, named WordSim^SE_DB, by leveraging the textual contents of posts in StackOverflow. Our approach measures the similarity of words by computing the similarities of the weighted co-occurrences of these words with three types of words in the textual corpus. We have evaluated our approach on a set of software-specific words and compared our approach with an existing WordNet-based technique (WordNet^res) to return top-k most similar words. Human judges are used to evaluate the effectiveness of the two techniques. We find that WordNet^res returns no result for 55 % of the queries. For the remaining queries, WordNet^res returns significantly poorer results.

Discipline

Computer Sciences | Software Engineering

Research Areas

Software and Cyber-Physical Systems

Publication

2014 Software Evolution Week: IEEE Conference on Software Maintenance, Reengineering and Reverse Engineering (CSMR-WCRE): Proceedings: February 3-6, 2014, Antwerp

First Page

Last Page

ISBN

9781479937523

Identifier

10.1109/CSMR-WCRE.2014.6747213

Publisher

IEEE

City or Country

Piscataway, NJ

Citation

TIAN, Yuan; LO, David; and Lawall, Julia. Automated Construction of a Software-Specific Word Similarity Database. (2014). 2014 Software Evolution Week: IEEE Conference on Software Maintenance, Reengineering and Reverse Engineering (CSMR-WCRE): Proceedings: February 3-6, 2014, Antwerp. 44-53.
Available at: https://ink.library.smu.edu.sg/sis_research/2033

Copyright Owner and License

Authors

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1109/CSMR-WCRE.2014.6747213

Download

Find it in your library

Included in

Software Engineering Commons

COinS

Research Collection School Of Computing and Information Systems

Automated Construction of a Software-Specific Word Similarity Database

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Automated Construction of a Software-Specific Word Similarity Database

Author

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links