Publication Type

PhD Dissertation

Version

publishedVersion

Publication Date

5-2025

Abstract

Modern machine learning (ML) models achieve remarkable success, but face critical reliability challenges. This thesis advances two pillars of reliable ML systems: interpretability through data attribution and robustness against adversarial threats.

In the first part, we develop novel data attribution methods to elucidate the data-model relationship. We establish the critical role of memorization in model generalization through token-level influence analysis, extend sample-level attribution to diffusion models with effective approximation techniques, and introduce REGMIX, a group-level approach that predicts data mixture performance using small-scale experiments. These contributions provide practitioners with scalable tools to audit training data impacts across modalities.

The second part exposes vulnerabilities in ML robustness through three adversarial perspectives. We reveal cascading failures in multi-agent LLM systems where single adversarial inputs propagate through million-agent networks, develop improved attacks achieving 99% success against state-of-the-art aligned models, and demonstrate how trivial "null models" exploit benchmark design flaws. Our findings challenge prevailing assumptions about LLM security and evaluation practices.

Collectively, this work bridges the gap between model capabilities and operational reliability. By advancing both explanatory frameworks for model decisions and exposing critical vulnerabilities, we provide insights for developing reliable ML systems that are both understandable and secure against emerging threats.

Degree Awarded

PhD in Computer Science

Discipline

Artificial Intelligence and Robotics | Databases and Information Systems

Supervisor(s)

JIANG, Jing

First Page

Last Page

322

Publisher

Singapore Management University

City or Country

Singapore

Citation

ZHENG, Xiaosen. Towards reliable ML: Data attribution and adversarial robustness. (2025). 1-322.
Available at: https://ink.library.smu.edu.sg/etd_coll/766

Copyright Owner and License

Author

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Download

Included in

Artificial Intelligence and Robotics Commons, Databases and Information Systems Commons

COinS

Dissertations and Theses Collection (Open Access)

Towards reliable ML: Data attribution and adversarial robustness

Publication Type

Version

Publication Date

Abstract

Degree Awarded

Discipline

Supervisor(s)

First Page

Last Page

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Included in

Search

Links

Browse

Links

Dissertations and Theses Collection (Open Access)

Towards reliable ML: Data attribution and adversarial robustness

Author

Publication Type

Version

Publication Date

Abstract

Degree Awarded

Discipline

Supervisor(s)

First Page

Last Page

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Included in

Share

Search

Links

Browse

Links