Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
7-2023
Abstract
Twitter has become an alternative information source during a crisis. However, the short, noisy nature of tweets hinders information extraction. While models trained with standard Twitter crisis datasets accomplished decent performance, it remained a challenge to generalize to unseen crisis events. Thus, we proposed adding “difficult” negative examples during training to improve model generalization for Twitter crisis detection. Although adding random noise is a common practice, the impact of difficult negatives, i.e., negative data semantically similar to true examples, was never examined in NLP. Most of existing research focuses on the classification task, without considering the primary information need of crisis responders. In our study, we implemented multiple sequence tagging models and studied quantitatively and qualitatively the impact of difficult negatives on sequence tagging. We evaluated models on unseen events and showed that difficult negative forced models to generalize better, leading to more accurate information extraction in a real-world application.
Keywords
Twitter, Crisis Detection, Difficult Negative Data, Negative Mining
Discipline
Databases and Information Systems | Social Media
Research Areas
Data Science and Engineering
Publication
Proceedings of Pacific Asia Conference on Information Systems 2023, Nanchang, China, July 8-12
First Page
1
Last Page
15
City or Country
Nanchang, China
Citation
ZHANG, Yuhao; LO, Siaw Ling; and WIN MYINT, Phyo Yi.
Impact of difficult negatives on Twitter crisis detection. (2023). Proceedings of Pacific Asia Conference on Information Systems 2023, Nanchang, China, July 8-12. 1-15.
Available at: https://ink.library.smu.edu.sg/sis_research/8007
Copyright Owner and License
Authors
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://aisel.aisnet.org/pacis2023/156