Publication Type

PhD Dissertation

Version

publishedVersion

Publication Date

5-2025

Abstract

Modern machine learning (ML) models achieve remarkable success, but face critical reliability challenges. This thesis advances two pillars of reliable ML systems: interpretability through data attribution and robustness against adversarial threats.

In the first part, we develop novel data attribution methods to elucidate the data-model relationship. We establish the critical role of memorization in model generalization through token-level influence analysis, extend sample-level attribution to diffusion models with effective approximation techniques, and introduce REGMIX, a group-level approach that predicts data mixture performance using small-scale experiments. These contributions provide practitioners with scalable tools to audit training data impacts across modalities.

The second part exposes vulnerabilities in ML robustness through three adversarial perspectives. We reveal cascading failures in multi-agent LLM systems where single adversarial inputs propagate through million-agent networks, develop improved attacks achieving 99% success against state-of-the-art aligned models, and demonstrate how trivial "null models" exploit benchmark design flaws. Our findings challenge prevailing assumptions about LLM security and evaluation practices.

Collectively, this work bridges the gap between model capabilities and operational reliability. By advancing both explanatory frameworks for model decisions and exposing critical vulnerabilities, we provide insights for developing reliable ML systems that are both understandable and secure against emerging threats.

Degree Awarded

PhD in Computer Science

Discipline

Artificial Intelligence and Robotics | Databases and Information Systems

Supervisor(s)

JIANG, Jing

First Page

1

Last Page

322

Publisher

Singapore Management University

City or Country

Singapore

Copyright Owner and License

Author

Share

COinS