Publication Type
PhD Dissertation
Version
publishedVersion
Publication Date
11-2023
Abstract
The proliferation of affordable and compact digital storage has also led to the creation of enormous databases of information, and much attention has been focused on the problem of processing unorganized and unstructured information into some form from which additional value can be extracted. Contemporary approaches to this problem virtually necessitate the use of complex models running on computational systems due to the sheer volume of information to be processed. While it is possible for the model to be fed the actual data as input, typically a representation of the data is used instead. These representations are therefore of interest, as they act as intermediaries through which the database information are processed and therefore impact the resulting performance of the trained model.
This dissertation is split into two parts: we first discuss in detail the effectiveness and efficiency of semantic data representations: Effective semantic representations focus on aspects generally related to the capabilities of these representations, such as task performance and interpretability. Efficient semantic representations encompass aspects which generally relate to the utilization of these representations, such as their storage size as well as generalizability across multiple tasks. Next, we explore an application of semantic representations in downstream tasks, before elaborating on multiple directions relating to such applications for future work.
We present two works for discussion in the first part of the dissertation, where each work is focused on a specific form of semantic data. For textual data representations, we introduce a novel approach that improves efficiency through discarding representations, while limiting the impacts on downstream task effectiveness. For knowledge base representations, we explore a novel measure of node importance in knowledge graphs, and present a heuristic approach for selecting such nodes in large knowledge graphs.
In the second part of the dissertation, we discuss the application of semantic representations in two downstream Natural Language Processing (NLP) tasks. We first describe the use of semantic representations generated by Large Language Models (LLMs) in an Information Retrieval (IR) system, and overcome the "cold-start" problem in the Legal NLP domain by introducing a novel heuristic for labelling "key" legal passages. We then propose a future research direction for generating summaries from long legal documents, which raises research questions regarding the input representation of such documents as well as the evaluation of such summarization models.
Keywords
Data-driven Optimization, Urban Logistics, Dynamic Pickup and Delivery Problem
Degree Awarded
PhD in Computer Science
Discipline
Programming Languages and Compilers | Software Engineering
Supervisor(s)
LAUW, Hady Wirawan
First Page
1
Last Page
135
Publisher
Singapore Management University
City or Country
Singapore
Citation
CHIA, Chong Cher.
Effective and efficient semantic representations and their applications. (2023). 1-135.
Available at: https://ink.library.smu.edu.sg/etd_coll/536
Copyright Owner and License
Author
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.