Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

12-2012

Abstract

The lack of annotated corpora brings limitations in research of discourse classification for many languages. In this paper, we present the first effort towards recognizing ambiguities of discourse connectives, which is fundamental to discourse classification for resource-poor language such as Chinese. A language independent framework is proposed utilizing bilingual dictionaries, Penn Discourse Treebank and parallel data between English and Chinese. We start from translating the English connectives to Chinese using a bi-lingual dictionary. Then, the ambiguities in terms of senses a connective may signal are estimated based on the ambiguities of English connectives and word alignment information. Finally, the ambiguity between discourse usage and non-discourse usage were disambiguated using the co-training algorithm. Experimental results showed the proposed method not only built a high quality connective lexicon for Chinese but also achieved a high performance in recognizing the ambiguities. We also present a discourse corpus for Chinese which will soon become the first Chinese discourse corpus publicly available.

Keywords

Discourse, Explicit Connectives, Ambiguity of Connectives

Discipline

Databases and Information Systems

Research Areas

Data Science and Engineering

Publication

Proceedings of 24th International Conference on Computational Linguistics (COLING 2012)

First Page

1409

Last Page

1418

Publisher

Association for Computational Linguistics

City or Country

Bombay, India

Additional URL

https://aclweb.org/anthology/C12-2138

Share

COinS