Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

9-2015

Abstract

Summarizing opinions expressed in online forums can potentially benefit many people. However, special characteristics of this problem may require changes to standard text summarization techniques. In this work, we present our initial attempt at extractive summarization of opinionated online forum threads. Given the nature of user generated content in online discussion forums, we hypothesize that besides relevance, text quality and subjectivity also play important roles in deciding which sentences are good summary sentences. We therefore construct an annotated corpus to facilitate our study of extractive summarization of online discussion forums. We define a set of features to capture relevance, text quality and subjectivity, and empirically test their usefulness in choosing summary sentences. Using unpaired Student's t-test, we find that sentence length and number of sentiment words have high correlations with good summary sentences. Finally we propose some simple modifications to a standard Integer Linear Programming based summarization framework to incorporate these features.

Keywords

Computational linguistics, Integer programming, Online systems, Social networking (online), Text processing, Extractive summarizations, Integer Linear Programming, Online discussion forums, Sentence length, Simple modifications, Student's t tests, Text summarization, User-generated content

Discipline

Databases and Information Systems

Publication

Proceedings of Recent Advances in Natural Language Processing: 10th RANLP 2015, Hissar, Bulgaria, September 7-9, 2015

First Page

138

Last Page

146

Publisher

ACL

City or Country

Stroudsburg, PA

Additional URL

https://aclweb.org/anthology/R/R15/R15-1020.pdf

Share

COinS