Publication Type
Conference Proceeding Article
Version
acceptedVersion
Publication Date
7-2025
Abstract
Recent progress in Meta-Black-Box-Optimization (MetaBBO) has demonstrated that using RL to learn a meta-level policy for dynamic algorithm configuration (DAC) over an optimization task distribution could significantly enhance the performance of the low-level BBO algorithm. However, the online learning paradigms in existing works makes the efficiency of MetaBBO problematic. To address this, we propose an offline learning-based MetaBBO framework in this paper, termed Q-Mamba, to attain both effectiveness and efficiency in MetaBBO. Specifically, we first transform DAC task into long-sequence decision process. This allows us further introduce an effective Q-function decomposition mechanism to reduce the learning difficulty within the intricate algorithm configuration space. Under this setting, we propose three novel designs to meta-learn DAC policy from offline data: we first propose a novel collection strategy for constructing offline DAC experiences dataset with balanced exploration and exploitation. We then establish a decomposition-based Q-loss that incorporates conservative Q-learning to promote stable offline learning from the offline dataset. To further improve the offline learning efficiency, we equip our work with a Mamba architecture which helps long-sequence learning effectiveness and efficiency by selective state model and hardware-aware parallel scan respectively. Through extensive benchmarking, we observe that Q-Mamba achieves competitive or even superior performance to prior online/offline baselines, while significantly improving the training efficiency of existing online baselines. We provide sourcecodes of Q-Mamba at this https URL.
Discipline
Artificial Intelligence and Robotics
Research Areas
Intelligent Systems and Optimization
Areas of Excellence
Sustainability
Publication
Proceedings of the 42nd International Conference on Machine Learning, Vancouver, Canada, July 13-19
First Page
1
Last Page
20
Identifier
10.48550/arXiv.2505.02010
City or Country
Vancouver, Canada
Citation
MA, Zeyuan; CAO, Zhiguang; JIANG, Zhou; GUO, Hongshu; and GONG, Yue-Jiao.
Meta-black-box-optimization through offline Q-function learning. (2025). Proceedings of the 42nd International Conference on Machine Learning, Vancouver, Canada, July 13-19. 1-20.
Available at: https://ink.library.smu.edu.sg/sis_research/10563
Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.48550/arXiv.2505.02010