Publication Type

PhD Dissertation

Publication Date



With the rapid development of online shopping sites and social media, product reviews are accumulating. These reviews contain information that is valuable to both businesses and customers. To businesses, companies can easily get a large number of feedback of their products, which is difficult to achieve by doing customer survey in the traditional way. To customers, they can know the products they are interested in better by reading reviews, which may be uneasy without online reviews. However, the accumulation has caused consuming all reviews impossible. It is necessary to develop automated techniques to efficiently process them. One of the most fundamental research problems related to product review analysis is aspect discovery. Aspects are components or attributes of a product or service. Aspect discovery is to find the relevant terms and then cluster them into aspects. As users often evaluate products based on aspects, presenting them with aspect level analysis is very necessary. Meanwhile, aspect discovery works as the basis of many downstream applications, such as aspect level opinion summarization, rating prediction, and product recommendation. There are three basic steps to go through for aspect discovery. The first one is about defining the aspects we need. In this step, we need to understand and determine what are considered aspects. The second one is about identifying words that are used to describe aspects. This step can help us concentrate on analyzing information that is most relevant to aspect discovery. The third one is about clustering words into aspects. The main goal of this step is to cluster words that are about the same aspect into the same group. There has been much work trying to do the three basic steps in different ways. However, there still exist some limitations with them. In the first step, most existing studies assume that they can discover aspects that people use to evaluate products. However, besides aspects, there also exist another type of latent topics in product reviews, which is named “properties” by us. Properties are attributes that are intrinsic to products, which are not suitable to be used to compare different products. In the second step, to identify aspect words, many supervised learning based models have been proposed. While proven to be effective, they require large amounts of training data and turn to be much less useful when applied to data from a different domain. To finish the third step, many extensions of LDA have been proposed for clustering aspect words. Most of them only rely on the co-occurrence statistics of words without considering the semantic meanings of words. In this dissertation, we try to propose several new models to deal with some remaining problems of existing work:

1. We propose a principled model to separate product properties from aspects and connect both of them with ratings. Our model can effectively do the separation and its output can help us understand users’ shopping behaviors and preferences better.

2. We design two Recurrent Neural Network (RNN) based models to incorporate domain independent rules into domain specific supervised learning based neural networks. Our models can improve a lot over some existing strong baselines in the task of cross-domain aspect word identification.

3. We use word embeddings to boost traditional topic modeling of product reviews. The proposed model is more effective in both discovering meaningful aspects and recommending products to users.

4. We propose a model integrating RNN with Neural Topic model (NTM) to jointly identify and cluster aspect words. Our model is able to discover clearer and more coherent aspects. It is also more effective in sentence clustering than the baselines.


opinion mining, topic models, deep learning, recommender systems, data mining, machine learning

Degree Awarded

PhD in Information Systems


Databases and Information Systems




Singapore Management University

City or Country


Copyright Owner and License


Creative Commons License

Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Available for download on Thursday, October 22, 2020