Research Collection School Of Computing and Information Systems

Detecting semantic uncertainty by learning hedge cues in sentences using an HMM

Publication Type

Book Chapter

Version

acceptedVersion

Publication Date

11-2017

Abstract

Detecting speculative assertions is essential to distinguish semantically uncertain information from the factual ones in text. This is critical to the trustworthiness of many intelligent systems that are based on information retrieval and natural language processing techniques, such as question answering or information extraction. We empirically explore three fundamental issues of uncertainty detection: (1) the predictive ability of different learning methods on this task; (2) whether using unlabeled data can lead to a more accurate model; and (3) whether closed-domain training or crossdomain training is better. For these purposes, we adopt two statistical learning approaches to this problem: the commonly used bag-of-words model based on Naive Bayes, and the sequence labeling approach using a Hidden Markov Model (HMM). We empirically compare between our two approaches as well as externally compare with prior results on the CoNLL-2010 Shared Task 1. Overall, our results are promising: (1) on Wikipedia and biomedical datasets, the HMM model improves over Naive Bayes up to 17.4% and 29.0%, respectively, in terms of absolute F score; (2) compared to CoNLL-2010 systems, our best HMM model achieves 62.9% F score with MLE parameter estimation and 64.0% with EM parameter estimation on Wikipedia dataset, both outperforming the best result (60.2%) of the CoNLL-2010 systems, but our results on the biomedical dataset are less impressive; (3) when the expression ability of a model (e.g., Naive Bayes) is not strong enough, cross-domain training is helpful, and when a model is powerful (e.g., HMM), cross-domain training may produce biased parameters; and (4) under Maximum Likelihood Estimation, combining the unlabeled examples with the labeled helps.

Keywords

Uncertainty detection, Hedge cues, Naive Bayes, HMM, Cross-domain training

Discipline

Theory and Algorithms

Research Areas

Data Science and Engineering

Publication

Social media content analysis: Natural language processing and beyond

Volume

Editor

Kam-Fai Wong, Wei Gao, Ruifeng Xu, Wenjie Li

Identifier

10.1142/9789813223615_0008

Publisher

World Scientific Publishing

Citation

LI, Xiujun; GAO, Wei; and SHAVLIK, Jude. Detecting semantic uncertainty by learning hedge cues in sentences using an HMM. (2017). Social media content analysis: Natural language processing and beyond. 3,.
Available at: https://ink.library.smu.edu.sg/sis_research/4643

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1142/9789813223615_0008

Download

Find it in your library

Included in

Theory and Algorithms Commons

COinS

Research Collection School Of Computing and Information Systems

Detecting semantic uncertainty by learning hedge cues in sentences using an HMM

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

Editor

Identifier

Publisher

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Detecting semantic uncertainty by learning hedge cues in sentences using an HMM

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

Editor

Identifier

Publisher

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links