Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
12-2012
Abstract
The lack of annotated corpora brings limitations in research of discourse classification for many languages. In this paper, we present the first effort towards recognizing ambiguities of discourse connectives, which is fundamental to discourse classification for resource-poor language such as Chinese. A language independent framework is proposed utilizing bilingual dictionaries, Penn Discourse Treebank and parallel data between English and Chinese. We start from translating the English connectives to Chinese using a bi-lingual dictionary. Then, the ambiguities in terms of senses a connective may signal are estimated based on the ambiguities of English connectives and word alignment information. Finally, the ambiguity between discourse usage and non-discourse usage were disambiguated using the co-training algorithm. Experimental results showed the proposed method not only built a high quality connective lexicon for Chinese but also achieved a high performance in recognizing the ambiguities. We also present a discourse corpus for Chinese which will soon become the first Chinese discourse corpus publicly available.
Keywords
Discourse, Explicit Connectives, Ambiguity of Connectives
Discipline
Databases and Information Systems
Research Areas
Data Science and Engineering
Publication
Proceedings of 24th International Conference on Computational Linguistics (COLING 2012)
First Page
1409
Last Page
1418
Publisher
Association for Computational Linguistics
City or Country
Bombay, India
Citation
ZHOU, Lanjun; GAO, Wei; LI, Binyang; WEI, Zhongyu; and WONG, Kam-Fai.
Cross-lingual identification of ambiguous discourse connectives for resource-poor language. (2012). Proceedings of 24th International Conference on Computational Linguistics (COLING 2012). 1409-1418.
Available at: https://ink.library.smu.edu.sg/sis_research/4588
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://aclweb.org/anthology/C12-2138