Publication Type
Conference Proceeding Article
Version
acceptedVersion
Publication Date
6-2023
Abstract
Recent research on vulnerabilities of deep reinforcement learning (RL) has shown that adversarial policies adopted by an adversary agent can influence a target RL agent (victim agent) to perform poorly in a multi-agent environment. In existing studies, adversarial policies are directly trained based on experiences of interacting with the victim agent. There is a key shortcoming of this approach --- knowledge derived from historical interactions may not be properly generalized to unexplored policy regions of the victim agent, making the trained adversarial policy significantly less effective. In this work, we design a new effective adversarial policy learning algorithm that overcomes this shortcoming. The core idea of our new algorithm is to create a new imitator --- the imitator will learn to imitate the victim agent's policy while the adversarial policy will be trained not only based on interactions with the victim agent but also based on feedback from the imitator to forecast victim's intention. By doing so, we can leverage the capability of imitation learning in well capturing underlying characteristics of the victim policy only based on sample trajectories of the victim. Our victim imitation learning model differs from prior models as the environment's dynamics are driven by adversary's policy and will keep changing during the adversarial policy training. We provide a provable bound to guarantee a desired imitating policy when the adversary's policy becomes stable. We further strengthen our adversarial policy learning by making our imitator a stronger version of the victim. That is, we incorporate the opposite of the adversary's value function to the imitation objective, leading the imitator not only to learn the victim policy but also to be adversarial to the adversary. Finally, our extensive experiments using four competitive MuJoCo game environments show that our proposed adversarial policy learning algorithm outperforms state-of-the-art algorithms.
Keywords
Reinforcement Learning, Non-zero-sum Multi-agent Competition, Adversarial Policy, Imitation Learning
Discipline
Artificial Intelligence and Robotics | Numerical Analysis and Scientific Computing
Research Areas
Intelligent Systems and Optimization
Publication
Proceedings of the 22nd International Conference on Autonomous Agents and Multiagent Systems, London, England, 2023 May 29 - June 2
Identifier
10.48550/arXiv.2210.16915
Publisher
International Foundation for Autonomous Agents and Multiagent Systems
City or Country
Taipei
Citation
BUI, The Viet; MAI, Tien; and NGUYEN, Thanh H..
Imitating opponent to win: Adversarial policy imitation learning in two-player competitive games. (2023). Proceedings of the 22nd International Conference on Autonomous Agents and Multiagent Systems, London, England, 2023 May 29 - June 2.
Available at: https://ink.library.smu.edu.sg/sis_research/8332
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.48550/arXiv.2210.16915
Included in
Artificial Intelligence and Robotics Commons, Numerical Analysis and Scientific Computing Commons