Publication Type

Conference Proceeding Article

Publication Date



Summarizing opinions expressed in online forums can potentially benefit many people. However, special characteristics of this problem may require changes to standard text summarization techniques. In this work, we present our initial attempt at extractive summarization of opinionated online forum threads. Given the nature of user generated content in online discussion forums, we hypothesize that besides relevance, text quality and subjectivity also play important roles in deciding which sentences are good summary sentences. We therefore construct an annotated corpus to facilitate our study of extractive summarization of online discussion forums. We define a set of features to capture relevance, text quality and subjectivity, and empirically test their usefulness in choosing summary sentences. Using unpaired Student's t-test, we find that sentence length and number of sentiment words have high correlations with good summary sentences. Finally we propose some simple modifications to a standard Integer Linear Programming based summarization framework to incorporate these features.


Computational linguistics, Integer programming, Online systems, Social networking (online), Text processing, Extractive summarizations, Integer Linear Programming, Online discussion forums, Sentence length, Simple modifications, Student's t tests, Text summarization, User-generated content


Databases and Information Systems

Research Areas

Data Management and Analytics


Proceedings of Recent Advances in Natural Language Processing: 10th RANLP 2015, Hissar, Bulgaria, September 7-9, 2015

First Page


Last Page




City or Country

Stroudsburg, PA

Creative Commons License

Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Additional URL