Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
5-2016
Abstract
In software development process, developers often seek solutions to the technical problems they encounter by searching relevant questions on Q&A sites. When developers fail to find solutions on Q&A sites in their native language (e.g., Chinese), they could translate their query and search on the Q&A sites in another language (e.g., English). However, developers who are non-native English speakers often are not comfortable to ask or search questions in English, as they do not know the proper translation of the Chinese technical words into the English technical words. Furthermore, the process of manually formulating cross-language queries and determining the weight of query words is a tedious and time-consuming process. For the purpose of helping Chinese developers take advantage of the rich knowledge base of the English version of Stack Overflow and simplify the retrieval process, we propose an automated crosslanguage relevant question retrieval (CLRQR) system to retrieve relevant English questions on Stack Overflow for a given Chinese question. Our CLRQR system first extracts essential information (both Chinese and English) from the title and description of the input Chinese question, then performs domain-specific translation of the essential Chinese information into English, and formulates a query with highest-scored English words for retrieving relevant questions in a repository of 684,599 Java questions in English from Stack Overflow. To evaluate the performance of our proposed approach, we also propose four online retrieval approaches as baselines. We randomly select 80 Java questions in SegmentFault and V2EX (two Chinese Q&A websites for computer programming) as the query Chinese questions. Each approach returns top-10 most relevant questions for a given Chinese question. We invite 5 users to evaluate the relevance of the retrieved English questions. The experiment results show that CLRQR system outperforms the four baseline approaches, and the statistical tests show the improvements are significant.
Keywords
Cross-language question retrieval; Domain-specific translation
Discipline
Programming Languages and Compilers
Research Areas
Software and Cyber-Physical Systems
Publication
Proceedings of the 2016 13th International Conference on Mining Software Repositories, Austin, United States, 2016 May 14-15
First Page
413
Last Page
424
ISBN
9781450341868
Identifier
10.1145/2901739.2901746
Publisher
Association for Computing Machinery, Inc
City or Country
Piscataway, NJ
Citation
XU, Bowen; XING, Zhenchang; XIA, Xin; David LO; WANG, Qingye; and LI, Shanping.
Domain-specific cross-language relevant question retrieval. (2016). Proceedings of the 2016 13th International Conference on Mining Software Repositories, Austin, United States, 2016 May 14-15. 413-424.
Available at: https://ink.library.smu.edu.sg/sis_research/3562
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
http://doi.org./10.1145/2901739.2901746