Research Collection School Of Computing and Information Systems

Cross-modal food retrieval: Learning a joint embedding of food images and recipes with semantic consistency and attention mechanism

Publication Type

Journal Article

Version

acceptedVersion

Publication Date

1-2022

Abstract

Food retrieval is an important task to perform analysis of food-related information, where we are interested in retrieving relevant information about the queried food item such as ingredients, cooking instructions, etc. In this paper, we investigate cross-modal retrieval between food images and cooking recipes. The goal is to learn an embedding of images and recipes in a common feature space, such that the corresponding image-recipe embeddings lie close to one another. Two major challenges in addressing this problem are 1) large intra-variance and small inter-variance across cross-modal food data; and 2) difficulties in obtaining discriminative recipe representations. To address these two problems, we propose Semantic-Consistent and Attentionbased Networks (SCAN), which regularize the embeddings of the two modalities through aligning output semantic probabilities. Besides, we exploit a self-attention mechanism to improve the embedding of recipes.We evaluate the performance of the proposed method on the large-scale Recipe1M dataset, and show that we can outperform several state-of-the-art cross-modal retrieval strategies for food images and cooking recipes by a significant margin.

Keywords

Correlation, Cross-Modal Retrieval, Data models, Deep Learning, Semantics, Sugar, Task analysis, Training, Visionand-Language, Visualization

Discipline

Artificial Intelligence and Robotics | Databases and Information Systems | Graphics and Human Computer Interfaces

Research Areas

Data Science and Engineering

Publication

IEEE Transactions on Multimedia

Volume

First Page

2515

Last Page

2525

ISSN

1520-9210

Identifier

10.1109/TMM.2021.3083109

Publisher

Institute of Electrical and Electronics Engineers

Citation

WANG, Hao; SAHOO, Doyen; LIU, Chenghao; SHU, Ke; ACHANANUPARP, Palakorn; LIM, Ee-peng; and HOI, Steven C. H.. Cross-modal food retrieval: Learning a joint embedding of food images and recipes with semantic consistency and attention mechanism. (2022). IEEE Transactions on Multimedia. 24, 2515-2525.
Available at: https://ink.library.smu.edu.sg/sis_research/6268

Copyright Owner and License

Authors

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1109/TMM.2021.3083109

Download

Find it in your library

Included in

Artificial Intelligence and Robotics Commons, Databases and Information Systems Commons, Graphics and Human Computer Interfaces Commons

COinS

Research Collection School Of Computing and Information Systems

Cross-modal food retrieval: Learning a joint embedding of food images and recipes with semantic consistency and attention mechanism

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Cross-modal food retrieval: Learning a joint embedding of food images and recipes with semantic consistency and attention mechanism

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links