Publication Type
Conference Proceeding Article
Version
acceptedVersion
Publication Date
1-2026
Abstract
Despite the rapid advances in Visual Language Models (VLMs), these models struggle to recognize culture-specific food items. While VLMs are effective in recognizing popular cultural dishes, their performance is suboptimal for dishes that are unique but not widely known internationally. Specifically, VLMs often generate either generic labels or hallucinated names for dishes that are localized to a particular culture. As a result, retrieval-augmented generation (RAG), which retrieves relevant recipes as references for VLMs, emerges as a promising approach. Nevertheless, recipe retrieval, which is itself imperfect, could mislead VLMs into generating inaccurate or culturally inappropriate dish names. This paper presents a comparative study evaluating RAG-based food recognition against conventional approaches, including neural network-based recognition and standalone recipe retrieval. We propose an optimized hybrid framework that integrates the strengths of both VLMs and conventional techniques. The proposed framework achieves the best overall performance in recognizing multicultural dishes and demonstrates robustness in identifying out-of-distribution dishes from a different domain.
Keywords
food recognition, RAG, search re-ranking, VLMs
Discipline
Artificial Intelligence and Robotics | Food Science
Research Areas
Software and Cyber-Physical Systems
Publication
Multimedia Modeling: 32nd International Conference on Multimedia Modeling, MMM 2026, Prague, Czech Republic, January 29-31, Proceedings
First Page
1
Last Page
15
ISBN
9789819569595
Identifier
10.1007/978-981-95-6960-1_29
Publisher
Springer
City or Country
Cham
Citation
GAN, Kian Yu; NGUYEN, Phuong Anh; and NGO, Chong-wah.
Food recognition with visual language models: Search re-ranking or retrieval-augmented generation?. (2026). Multimedia Modeling: 32nd International Conference on Multimedia Modeling, MMM 2026, Prague, Czech Republic, January 29-31, Proceedings. 1-15.
Available at: https://ink.library.smu.edu.sg/sis_research/11036
Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.