Publication Type

PhD Dissertation

Version

publishedVersion

Publication Date

11-2023

Abstract

The proliferation of affordable and compact digital storage has also led to the creation of enormous databases of information, and much attention has been focused on the problem of processing unorganized and unstructured information into some form from which additional value can be extracted. Contemporary approaches to this problem virtually necessitate the use of complex models running on computational systems due to the sheer volume of information to be processed. While it is possible for the model to be fed the actual data as input, typically a representation of the data is used instead. These representations are therefore of interest, as they act as intermediaries through which the database information are processed and therefore impact the resulting performance of the trained model.

This dissertation is split into two parts: we first discuss in detail the effectiveness and efficiency of semantic data representations: Effective semantic representations focus on aspects generally related to the capabilities of these representations, such as task performance and interpretability. Efficient semantic representations encompass aspects which generally relate to the utilization of these representations, such as their storage size as well as generalizability across multiple tasks. Next, we explore an application of semantic representations in downstream tasks, before elaborating on multiple directions relating to such applications for future work.

We present two works for discussion in the first part of the dissertation, where each work is focused on a specific form of semantic data. For textual data representations, we introduce a novel approach that improves efficiency through discarding representations, while limiting the impacts on downstream task effectiveness. For knowledge base representations, we explore a novel measure of node importance in knowledge graphs, and present a heuristic approach for selecting such nodes in large knowledge graphs.

In the second part of the dissertation, we discuss the application of semantic representations in two downstream Natural Language Processing (NLP) tasks. We first describe the use of semantic representations generated by Large Language Models (LLMs) in an Information Retrieval (IR) system, and overcome the "cold-start" problem in the Legal NLP domain by introducing a novel heuristic for labelling "key" legal passages. We then propose a future research direction for generating summaries from long legal documents, which raises research questions regarding the input representation of such documents as well as the evaluation of such summarization models.

Keywords

Data-driven Optimization, Urban Logistics, Dynamic Pickup and Delivery Problem

Degree Awarded

PhD in Computer Science

Discipline

Programming Languages and Compilers | Software Engineering

Supervisor(s)

LAUW, Hady Wirawan

First Page

1

Last Page

135

Publisher

Singapore Management University

City or Country

Singapore

Copyright Owner and License

Author

Available for download on Wednesday, February 12, 2025

Share

COinS