Publication Type
Journal Article
Version
publishedVersion
Publication Date
4-2018
Abstract
Chinese developers often cannot effectively search questions in English, because they may have difficulties in translating technical words from Chinese to English and formulating proper English queries. For the purpose of helping Chinese developers take advantage of the rich knowledge base of Stack Overflow and simplify the question retrieval process, we propose an automated cross-language relevant question retrieval (CLRQR) system to retrieve relevant English questions for a given Chinese question. CLRQR first extracts essential information (both Chinese and English) from the title and description of the input Chinese question, then performs domain-specific translation of the essential Chinese information into English, and finally formulates an English query for retrieving relevant questions in a repository of English questions from Stack Overflow. We propose three different retrieval algorithms (word-embedding, word-matching, and vector-space-model based methods) that exploit different document representations and similarity metrics for question retrieval. To evaluate the performance of our approach and investigate the effectiveness of different retrieval algorithms, we propose four baseline approaches based on the combination of different sources of query words, query formulation mechanisms and search engines. We randomly select 80 Java, 20 Python and 20 .NET questions in SegmentFault and V2EX (two Chinese Q&A websites for computer programming) as the query Chinese questions. We conduct a user study to evaluate the relevance of the retrieved English questions using CLRQR with different retrieval algorithms and the four baseline approaches. The experiment results show that CLRQR with word-embedding based retrieval achieves the best performance.
Keywords
Cross-language question retrieval, Domain-specific translation, Computer programming, Knowledge based systems, Linguistics, Search engines, Vector spaces, Cross-language question, Document Representation, Domain-specific translation, Query formulation, Retrieval algorithms, Retrieval process, Similarity metrics, Vector space models, Translation (languages)
Discipline
Databases and Information Systems | Numerical Analysis and Scientific Computing | Software Engineering
Research Areas
Data Science and Engineering
Publication
Empirical Software Engineering
Volume
23
Issue
2
First Page
1084
Last Page
1122
ISSN
1382-3256
Identifier
10.1007/s10664-017-9568-3
Publisher
Springer Verlag (Germany)
Citation
XU, Bowen; XING, Zhenchang; XIA, Xin; David LO; and LI, Shanping.
Domain-specific cross-language relevant question retrieval. (2018). Empirical Software Engineering. 23, (2), 1084-1122.
Available at: https://ink.library.smu.edu.sg/sis_research/3842
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1007/s10664-017-9568-3
Included in
Databases and Information Systems Commons, Numerical Analysis and Scientific Computing Commons, Software Engineering Commons