Research Collection School Of Computing and Information Systems

R2GAN: Cross-modal recipe retrieval with generative adversarial network

Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

6-2019

Abstract

Representing procedure text such as recipe for crossmodal retrieval is inherently a difficult problem, not mentioning to generate image from recipe for visualization. This paper studies a new version of GAN, named Recipe Retrieval Generative Adversarial Network (R2GAN), to explore the feasibility of generating image from procedure text for retrieval problem. The motivation of using GAN is twofold: learning compatible cross-modal features in an adversarial way, and explanation of search results by showing the images generated from recipes. The novelty of R2GAN comes from architecture design, specifically a GAN with one generator and dual discriminators is used, which makes the generation of image from recipe a feasible idea. Furthermore, empowered by the generated images, a two-level ranking loss in both embedding and image spaces are considered. These add-ons not only result in excellent retrieval performance, but also generate close-to-realistic food images useful for explaining ranking of recipes. On recipe1M dataset, R2GAN demonstrates high scalability to data size, outperforms all the existing approaches, and generates images intuitive for human to interpret the search results.

Keywords

Categorization, Image and Video Synthesis, Recognition: Detection, Representation Learning, Retrieval; Vision + Language

Discipline

Data Storage Systems | Graphics and Human Computer Interfaces | OS and Networks

Research Areas

Intelligent Systems and Optimization

Publication

Proceedings of the 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, California, June 16-21

First Page

11469

Last Page

11478

ISBN

9781728132938

Identifier

10.1109/CVPR.2019.01174

Publisher

IEEE Computer Society

City or Country

Long Beach

Citation

ZHU, Bin; NGO, Chong-wah; CHEN, Jingjing; and HAO, Yanbin. R2GAN: Cross-modal recipe retrieval with generative adversarial network. (2019). Proceedings of the 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, California, June 16-21. 11469-11478.
Available at: https://ink.library.smu.edu.sg/sis_research/6456

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Download

Included in

Data Storage Systems Commons, Graphics and Human Computer Interfaces Commons, OS and Networks Commons

COinS

Research Collection School Of Computing and Information Systems

R2GAN: Cross-modal recipe retrieval with generative adversarial network

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

R2GAN: Cross-modal recipe retrieval with generative adversarial network

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Included in

Share

Search

Links

Browse

Links