Research Collection School Of Computing and Information Systems

Retrieval augmented recipe generation

Guoshan LIU
Hailong YIN
Bin ZHU, Singapore Management UniversityFollow
Jingjing CHEN
Chong-wah NGO, Singapore Management UniversityFollow
Yu-Gang JIANG

Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

3-2025

Abstract

The growing interest in generating recipes from food images has drawn substantial research attention in recent years. Existing works for recipe generation primarily utilize a two-stage training method—first predicting ingredients from a food image and then generating instructions from both the image and ingredients. Large Multi-modal Models (LMMs), which have achieved notable success across a variety of vision and language tasks, shed light on generating both ingredients and instructions directly from images. Nevertheless, LMMs still face the common issue of hallu- cinations during recipe generation, leading to suboptimal performance. To tackle this issue, we propose a retrieval augmented large multimodal model for recipe generation. We first introduce Stochastic Diversified Retrieval Augmentation (SDRA) to retrieve recipes semantically related to the image from an existing datastore as a supplement, integrating them into the prompt to add diverse and rich context to the input image. Additionally, Self-Consistency Ensemble Voting mechanism is proposed to determine the most confident prediction recipes as the final output. It calculates the consistency among generated recipe candidates, which use different retrieval recipes as context for generation. Extensive experiments validate the effectiveness of our proposed method, which demonstrates state-of-the-art (SOTA) performance in recipe generation on the Recipe1M dataset.

Keywords

Retrieval augmented generation, recipe generation, Large Multi-modal Model

Discipline

Databases and Information Systems

Research Areas

Intelligent Systems and Optimization

Areas of Excellence

Digital transformation

Publication

IEEE/CVF Winter Conference on Applications of Computer Vision 2025, Tucson, Arizona, February 28 - March 4

First Page

Last Page

City or Country

Tucson, Arizona

Citation

LIU, Guoshan; YIN, Hailong; ZHU, Bin; CHEN, Jingjing; NGO, Chong-wah; and JIANG, Yu-Gang. Retrieval augmented recipe generation. (2025). IEEE/CVF Winter Conference on Applications of Computer Vision 2025, Tucson, Arizona, February 28 - March 4. 1-11.
Available at: https://ink.library.smu.edu.sg/sis_research/9824

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Download

Included in

Databases and Information Systems Commons

COinS

Research Collection School Of Computing and Information Systems

Retrieval augmented recipe generation

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Areas of Excellence

Publication

First Page

Last Page

City or Country

Citation

Creative Commons License

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Retrieval augmented recipe generation

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Areas of Excellence

Publication

First Page

Last Page

City or Country

Citation

Creative Commons License

Included in

Share

Search

Links

Browse

Links