Publication Type

Conference Paper

Version

submittedVersion

Publication Date

8-2009

Abstract

In this paper, we study the problem of extracting technical paraphrases from a parallel software corpus, namely, a collection of duplicate bug reports. Paraphrase acquisition is a fundamental task in the emerging area of text mining for software engineering. Existing paraphrase extraction methods are not entirely suitable here due to the noisy nature of bug reports. We propose a number of techniques to address the noisy data problem. The empirical evaluation shows that our method significantly improves an existing method by upto 58%

Discipline

Software Engineering

Research Areas

Software Systems

Publication

CLShort '09: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

First Page

197

Last Page

200

City or Country

ACM

Additional URL

http://dl.acm.org/citation.cfm?id=1667583.1667644

Share

COinS