Publication Type

PhD Dissertation

Version

publishedVersion

Publication Date

12-2018

Abstract

Online product reviews are important factors of consumers' purchase decisions. They invade more and more spheres of our life, we have reviews on books, electronics, groceries, entertainments, restaurants, travel experiences, etc. More than 90 percent of consumers read online reviews before they purchase products as reported by various consumers surveys. This observation suggests that product review information enhances consumer experience and helps them to make better-informed purchase decisions. There is an enormous amount of online reviews posted on e-commerce platforms, such as Amazon, Apple, Yelp, TripAdvisor. They vary in information and may be written with different experiences and preferences.

If online opinions are indeed important in many spheres of our lives, then their systematic analysis is a real-life problem. Due to an enormous amount of opinions scattered across the Web, a handcrafted analysis seems to carry an inadmissible cost of time and efforts. An alternative to consider is an automated or, more appropriately, semi-automated analysis conducted by computers as an assistance to human analysts. Text processing applications have received much attention in the past three decades and have been shown successful for language understanding.

Comparison mining aims at understanding opinion mining problems when multiple entities are present simultaneously. This includes, but not limited to deriving similarities and differences between entities and discovering information about the entity relations. The entities may be products, individuals, issues, etc. The notion of comparison tangles in in a form of joint evaluative statements, such as "I think A is better than B", "I think A is a good alternative to B", and introduces new research questions, similar and yet different from traditional opinion mining. How do we find these statements in a review? How do we interpret these statements? How do we make sense of thousands of such comparisons? In this study, we seek to answer these questions and propose a set of related computational solutions.

First, we investigate a comparison identification problem and cast it as a relation extraction problem. Within the relation extraction setup, we develop a new approach for identifying comparative relations. The formal investigation of the syntactic structure of comparative statements leads us to a kernel-based approach, which relies on the dependency structure of sentences. The proposed method shows state-of-the-art results for the comparison identification problem.

Second, we explore intrinsic properties of a comparative corpus to derive a joint model for comparison interpretation and aggregation. At the level of comparisons, the model seeks to derive the comparison outcome of a statement, i.e., which entity is preferred by the writer. At the aggregated level, it seeks to understand the overall ranking of the entities in a corpus of comparisons. The proposed model is shown to be superior to the approaches that tackle each level separately. An empirical evaluation demonstrates its effectiveness on real-world datasets.

Third, we look at the phenomenon of comparison disagreement, i.e., different users may have different preferences over the same set of entities. To capture this diversity, we propose a model for preference clustering and demonstrate its effectiveness and utility.

Fourth, we propose a method for explaining entity comparisons, when entities are identified by their textual representations. CompareLDA, a supervised topic model, is employed to align topics, distributions of co-occurring words, with comparisons, so that the topics are indicative of the "better" and "worse" entities. Through an empirical evaluation, we show that the proposed model is more effective for capturing comparisons than alternative supervised topic models.

All the proposed methods form substantial contribution within the comparison mining research and facilitate a better understanding of the opinion language.

Keywords

comparisons, graphical models, natural language processing, text mining

Degree Awarded

PhD in Information Systems

Discipline

Databases and Information Systems | Software Engineering

Supervisor(s)

LAUW, Hady Wirawan

Publisher

Singapore Management University

City or Country

Singapore

Copyright Owner and License

Author

Share

COinS