Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

12-2025

Abstract

The proliferation of open-source software (OSS) has made software supply chains prime targets for attacks like Package Confusion, where adversaries publish malicious packages with names deceptively similar to legitimate ones. Existing detection methods often rely on simple lexical similarity or passive analysis of known package pairs, struggle with high false positive rates (FPR), fail to proactively identify emerging threats, and are vulnerable to adversarial evasion. To overcome these limitations, we introduce AgentGuard, a novel framework for proactive, single-input package confusion detection. AgentGuard employs a multi-agent architecture that autonomously discovers potential confusion targets using fine-tuned word embedding model to hybird semantic search and subsequently evaluates the risk via a machine learning model incorporating multi-dimensional feature groups to enhance robustness. This design enables scalable, real-time monitoring across diverse software ecosystems. We evaluate AgentGuard on the challenging ConfuDB and NeupaneDB datasets. Our results demonstrate that AgentGuard significantly outperforms state-of-the-art baselines, improving accuracy by 10%-24% while simultaneously reducing the false positive rate by 9%- 31%.

Keywords

package confusion detection, LLM agent, Cybersecurity

Discipline

Artificial Intelligence and Robotics | Software Engineering

Areas of Excellence

Digital transformation

Publication

Proceedings of the 7th International Conference on Machine Learning for Cyber Security (ML4CS), Hangzhou, China, December 12-14

First Page

1

Last Page

15

City or Country

Hangzhou, China

Share

COinS