Publication Type
PhD Dissertation
Version
publishedVersion
Publication Date
11-2025
Abstract
Graph perturbation, rooted in classical perturbation theory, studies how small topology edits, i.e., adding or deleting edges, affects graph properties (e.g., density, centrality). This fundamental problem underpins applications like bioinformatics, privacy preservation and system defense. While much prior work targets perturbations that influence global graph statistics or model outputs, comparatively little addresses robustness for knowledge discovery and information retrieval. In these settings, graphs are attributed: nodes carry real-world semantics (e.g., locations, people) and edges encode interactions or relationships. This thesis proposes new formulations and algorithms that generate and leverage graph perturbations to make knowledge discovery and retrieval more robust. Specifically, we design novel problems and solutions for typical applications in the knowledge discovery and information retrieval.
First, we audit outstanding fact (OF) mining on knowledge graphs for decision making. An OF is a striking claim by which some entities stand out from their peers on some attribute. OFs serve data journalism, fact checking, and recommendation. However, one could jump to conclusions by selecting truthful OFs while intentionally or inadvertently ignoring lateral contexts and data that render them less striking. This jumping-to-conclusion bias from unstable OFs may disorient the public, including voters and consumers, raising concerns about fairness and transparency in political and business competition. It is thus ethically imperative for several stakeholders to measure the robustness of OFs with respect to lateral contexts and data. Unfortunately, a capacity for such inspection of OFs mined from knowledge graphs (KGs) is missing. In this thesis, we propose a methodology that inspects the robustness of OFs in KGs by perturbation analysis. We define (1) entity perturbation, which detects outlying contexts by perturbing context entities in the OF; and (2) data perturbation, which considers plausible data that render an OF less striking. We compute the \emph{expected} strikingness scores of OFs over perturbation relevance distributions and assess an OF as robust if its measured strikingness does not deviate significantly from the expected. We devise a suite of exact and sampling algorithms for perturbation analysis on large KGs. Extensive experiments reveal that our methodology accurately and efficiently detects frail OFs generated by existing mining approaches on KGs. We also show the effectiveness of our approaches through case and user studies.
Second, motivated by the path pattern constraints in the first work and by the central role of subgraph counting in anti-financial-crime analytics, we study graph perturbations for subgraph counting. Considering that , we formulate the problem as ksub which aims to identify k edge additions that maximize the count of a query graph. We prove that ksub is intractable due to its NP-hardness, even for constant approximation.To address this, we relax the problem into a top-k selection, termed topkSub. Depending on the structure of the relaxed query graph, we distinguish two possible processing scenarios (namely, the connected and the disconnected case), and design dedicated search space pruning strategies for each scenario. Additionally, we develop sampling techniques on the pruned search space to scale topkSub for handling large graphs. In the form of a case study, we demonstrate that topkSub effectively uncovers vulnerabilities in state-of-the-art GNN models for subgraph counting, providing a significant advantage over alternative graph perturbation methods. The efficiency analysis shows that our pruning techniques achieve substantial speedups for exact processing in both connected and disconnected cases, while our sampling methods reduce estimation errors by 1-2 orders of magnitude compared to state-of-the-art samplers.
Finally, motivated by the capabilities of large language models (LLMs) and the ubiquity of graphs in retrieval-augmented generation (RAG), we harness LLM feedback and graph-bandit policies to generate targeted graph perturbations that adapt the retrieval graph and yield more accurate, robust information retrieval. LLMs excel at generation but still suffer from outdated or missing knowledge. RAGs mitigates this by feeding the model external evidence at inference time, and several recent systems enrich that evidence with corpus-derived knowledge graphs. Existing pipelines fall into two camps: one-shot retrieval methods that rank chunks once with a graph model, and iterative methods that decompose a query into sub-queries reusing LLM feedback over multiple rounds. To address this issue, we recast graph-based RAG as a sequence of graph perturbations generated by a graph bandit with an LLM in the loop. At each step, the system selects a chunk as a probe and the LLM returns a reward signal. This reward induces a small, targeted policy perturbation to the retrieval graph reweighting nodes in a query-conditioned priority map. To mirror attention–decision dynamics in neuroscience, we use two complementary, LLM-scored rewards: (i) immediate relevance, the standalone answer utility of the probed chunk; and (ii) marginal information gain, the new evidence it contributes beyond what has already been retrieved. Experiments across multiple QA benchmarks show that this perturbation-driven approach is both effective and robust. Moreover, it integrates smoothly with diverse graph-construction backends. In short, we turn LLM feedback into purposeful graph perturbations that make retrieval adaptive, informative, and resilient.
Keywords
Graph Perturbation, Knowledge Discovery, Information Retrieval
Degree Awarded
PhD in Computer Science
Discipline
Artificial Intelligence and Robotics | Graphics and Human Computer Interfaces
Supervisor(s)
LI, Yuchen
First Page
1
Last Page
141
Publisher
Singapore Management University
City or Country
Singapore
Citation
XIAO, Hanhua.
Graph perturbations for robust knowledge discovery and retrieval. (2025). 1-141.
Available at: https://ink.library.smu.edu.sg/etd_coll/817
Copyright Owner and License
Author
Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Included in
Artificial Intelligence and Robotics Commons, Graphics and Human Computer Interfaces Commons