Conference Proceeding Article
Summarizing opinions expressed in online forums can potentially benefit many people. However, special characteristics of this problem may require changes to standard text summarization techniques. In this work, we present our initial attempt at extractive summarization of opinionated online forum threads. Given the nature of user generated content in online discussion forums, we hypothesize that besides relevance, text quality and subjectivity also play important roles in deciding which sentences are good summary sentences. We therefore construct an annotated corpus to facilitate our study of extractive summarization of online discussion forums. We define a set of features to capture relevance, text quality and subjectivity, and empirically test their usefulness in choosing summary sentences. Using unpaired Student's t-test, we find that sentence length and number of sentiment words have high correlations with good summary sentences. Finally we propose some simple modifications to a standard Integer Linear Programming based summarization framework to incorporate these features.
Computational linguistics, Integer programming, Online systems, Social networking (online), Text processing, Extractive summarizations, Integer Linear Programming, Online discussion forums, Sentence length, Simple modifications, Student's t tests, Text summarization, User-generated content
Databases and Information Systems
Data Management and Analytics
Proceedings of Recent Advances in Natural Language Processing: 10th RANLP 2015, Hissar, Bulgaria, September 7-9, 2015
City or Country
DING YING and Jing JIANG.
Towards Opinion Summarization from Online Forums. (2015). Proceedings of Recent Advances in Natural Language Processing: 10th RANLP 2015, Hissar, Bulgaria, September 7-9, 2015. 138-146. Research Collection School Of Information Systems.
Available at: http://ink.library.smu.edu.sg/sis_research/3072
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.