Research Collection School Of Computing and Information Systems

Deep understanding of cooking procedure for cross-modal recipe retrieval

Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

10-2018

Abstract

Finding a right recipe that describes the cooking procedure for a dish from just one picture is inherently a difficult problem. Food preparation undergoes a complex process involving raw ingredients, utensils, cutting and cooking operations. This process gives clues to the multimedia presentation of a dish (e.g., taste, colour, shape). However, the description of the process is implicit, implying only the cause of dish presentation rather than the visual effect that can be vividly observed on a picture. Therefore, different from other cross-modal retrieval problems in the literature, recipe search requires the understanding of textually described procedure to predict its possible consequence on visual appearance. In this paper, we approach this problem from the perspective of attention modeling. Specifically, we model the attention of words and sentences in a recipe and align them with its image feature such that both text and visual features share high similarity in multi-dimensional space. Through a large food dataset, Recipe1M, we empirically demonstrate that understanding the cooking procedure can lead to improvement in a large margin compared to the existing methods which mostly consider only ingredient information. Furthermore, with attention modeling, we show that language-specific namedentity extraction becomes optional. The result gives light to the feasibility of performing cross-lingual cross-modal recipe retrieval with off-the-shelf machine translation engines.

Keywords

Cross-modal learning, Hierarchical attention, Recipe retrieval

Discipline

Databases and Information Systems | Graphics and Human Computer Interfaces

Research Areas

Intelligent Systems and Optimization

Publication

MM '18: Proceedings of the 26th ACM international conference on Multimedia, Seoul, October 22-26

First Page

1020

Last Page

1028

ISBN

9781450356657

Identifier

10.1145/3240508.3240627

Publisher

ACM

City or Country

New York

Citation

CHEN, Jingjing; NGO, Chong-wah; FENG, Fu-Li; and CHUA, Tat-Seng. Deep understanding of cooking procedure for cross-modal recipe retrieval. (2018). MM '18: Proceedings of the 26th ACM international conference on Multimedia, Seoul, October 22-26. 1020-1028.
Available at: https://ink.library.smu.edu.sg/sis_research/6461

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1145/3240508.3240627

Download

Find it in your library

Included in

Databases and Information Systems Commons, Graphics and Human Computer Interfaces Commons

COinS

Research Collection School Of Computing and Information Systems

Deep understanding of cooking procedure for cross-modal recipe retrieval

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Deep understanding of cooking procedure for cross-modal recipe retrieval

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links