Research Collection School Of Computing and Information Systems

Cross-modal food retrieval: Learning a joint embedding of food images and recipes with semantic consistency and attention mechanism;

Hao WANG
Doyen SAHOO, Singapore Management UniversityFollow
Chenghao LIU, Singapore Management UniversityFollow
Ke SHU
Achananuparp Palakorn, Singapore Management UniversityFollow
Ee peng LIM, Singapore Management UniversityFollow
Steven HOI, Singapore Management UniversityFollow

Publication Type

Journal Article

Version

acceptedVersion

Publication Date

5-2021

Abstract

Food retrieval is an important task to perform analysis of food-related information, where we are interested in retrieving relevant information about the queried food item such as ingredients, cooking instructions, etc. In this paper, we investigate cross-modal retrieval between food images and cooking recipes. The goal is to learn an embedding of images and recipes in a common feature space, such that the corresponding image-recipe embeddings lie close to one another. Two major challenges in addressing this problem are 1) large intra-variance and small inter-variance across cross-modal food data; and 2) difficulties in obtaining discriminative recipe representations. To address these two problems, we propose Semantic-Consistent and Attentionbased Networks (SCAN), which regularize the embeddings of the two modalities through aligning output semantic probabilities. Besides, we exploit a self-attention mechanism to improve the embedding of recipes.We evaluate the performance of the proposed method on the large-scale Recipe1M dataset, and show that we can outperform several state-of-the-art cross-modal retrieval strategies for food images and cooking recipes by a significant margin.

Keywords

Correlation, Cross-Modal Retrieval, Data models, Deep Learning, Semantics, Sugar, Task analysis, Training, Visionand-Language, Visualization

Discipline

Graphics and Human Computer Interfaces | Theory and Algorithms

Research Areas

Data Science and Engineering

Publication

IEEE Transactions on Multimedia

ISSN

1520-9210

Identifier

10.1109/TMM.2021.3083109

Publisher

IEEE

Embargo Period

11-9-2021

Citation

WANG, Hao; SAHOO, Doyen; LIU, Chenghao; SHU, Ke; Palakorn, Achananuparp; LIM, Ee peng; and HOI, Steven. Cross-modal food retrieval: Learning a joint embedding of food images and recipes with semantic consistency and attention mechanism;. (2021). IEEE Transactions on Multimedia.
Available at: https://ink.library.smu.edu.sg/sis_research/6249

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

10.1109/TMM.2021.3083109

Download

Find it in your library

Included in

Graphics and Human Computer Interfaces Commons, Theory and Algorithms Commons

COinS

Research Collection School Of Computing and Information Systems

Cross-modal food retrieval: Learning a joint embedding of food images and recipes with semantic consistency and attention mechanism;

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

ISSN

Identifier

Publisher

Embargo Period

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Cross-modal food retrieval: Learning a joint embedding of food images and recipes with semantic consistency and attention mechanism;

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

ISSN

Identifier

Publisher

Embargo Period

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links