Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
8-2024
Abstract
Web agents powered by Large Language Models (LLMs) have demonstrated remarkable abilities in planning and executing multi-step interactions within complex web-based environments, fulfilling a wide range of web navigation tasks. Despite these advancements, the potential for LLM-powered agents to effectively engage with sequential user instructions in real-world scenarios has not been fully explored. In this work, we introduce a new task of Conversational Web Navigation, which necessitates sophisticated interactions that span multiple turns with both the users and the environment, supported by a specially developed dataset named Multi-Turn Mind2Web (MT-Mind2Web). To tackle the limited context length of LLMs and the context-dependency issue of the conversational tasks, we further propose a novel framework, named self-reflective memory-augmented planning (Self-MAP), which employs memory utilization and self-reflection techniques. Extensive experiments are conducted to benchmark the MT-Mind2Web dataset, and validate the effectiveness of the proposed method.
Discipline
Databases and Information Systems | Programming Languages and Compilers
Research Areas
Data Science and Engineering
Publication
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, Bangkok, Thailand, 2024 August 11-16
Volume
1
First Page
8795
Last Page
8812
Publisher
ACL
City or Country
USA
Citation
DENG, Yang; ZHANG, Xuan; ZHANG, Wenxuan; YUAN, Yifei; NG, See-Kiong; and CHUA, Tat-Seng.
On the multi-turn instruction following for conversational web agents. (2024). Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, Bangkok, Thailand, 2024 August 11-16. 1, 8795-8812.
Available at: https://ink.library.smu.edu.sg/sis_research/9236
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.