Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
7-2018
Abstract
We explore the notion of subjectivity, and hypothesize that word embeddings learnt from input corpora of varying levels of subjectivity behave differently on natural language processing tasks such as classifying a sentence by sentiment, subjectivity, or topic. Through systematic comparative analyses, we establish this to be the case indeed. Moreover, based on the discovery of the outsized role that sentiment words play on subjectivity-sensitive tasks such as sentiment classification, we develop a novel word embedding SentiVec which is infused with sentiment information from a lexical resource, and is shown to outperform baselines on such tasks.
Keywords
Computational linguistics, Embeddings, Natural language processing systems
Discipline
Databases and Information Systems | Numerical Analysis and Scientific Computing
Research Areas
Data Science and Engineering
Publication
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, 2018 July 15-20
First Page
1212
Last Page
1221
Identifier
10.18653/v1/P18-1112
Publisher
Association for Computational Linguistics
City or Country
Stroudsburg, PA
Citation
TKACHENKO, Maksim; CHIA, Chong Cher; and LAUW, Hady W..
Searching for the X-Factor: Exploring corpus subjectivity for word embeddings. (2018). Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, 2018 July 15-20. 1212-1221.
Available at: https://ink.library.smu.edu.sg/sis_research/4229
Copyright Owner and License
Publisher
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.18653/v1/P18-1112
Included in
Databases and Information Systems Commons, Numerical Analysis and Scientific Computing Commons