Publication Type
PhD Dissertation
Version
publishedVersion
Publication Date
5-2025
Abstract
Modern machine learning (ML) models achieve remarkable success, but face critical reliability challenges. This thesis advances two pillars of reliable ML systems: interpretability through data attribution and robustness against adversarial threats.
In the first part, we develop novel data attribution methods to elucidate the data-model relationship. We establish the critical role of memorization in model generalization through token-level influence analysis, extend sample-level attribution to diffusion models with effective approximation techniques, and introduce REGMIX, a group-level approach that predicts data mixture performance using small-scale experiments. These contributions provide practitioners with scalable tools to audit training data impacts across modalities.
The second part exposes vulnerabilities in ML robustness through three adversarial perspectives. We reveal cascading failures in multi-agent LLM systems where single adversarial inputs propagate through million-agent networks, develop improved attacks achieving 99% success against state-of-the-art aligned models, and demonstrate how trivial "null models" exploit benchmark design flaws. Our findings challenge prevailing assumptions about LLM security and evaluation practices.
Collectively, this work bridges the gap between model capabilities and operational reliability. By advancing both explanatory frameworks for model decisions and exposing critical vulnerabilities, we provide insights for developing reliable ML systems that are both understandable and secure against emerging threats.
Degree Awarded
PhD in Computer Science
Discipline
Artificial Intelligence and Robotics | Databases and Information Systems
Supervisor(s)
JIANG, Jing
First Page
1
Last Page
322
Publisher
Singapore Management University
City or Country
Singapore
Citation
ZHENG, Xiaosen.
Towards reliable ML: Data attribution and adversarial robustness. (2025). 1-322.
Available at: https://ink.library.smu.edu.sg/etd_coll/766
Copyright Owner and License
Author
Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.