Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
10-2020
Abstract
Understanding food recipe requires anticipating the implicit causal effects of cooking actions, such that the recipe can be converted into a graph describing the temporal workflow of the recipe. This is a non-trivial task that involves common-sense reasoning. However, existing efforts rely on hand-crafted features to extract the workflow graph from recipes due to the lack of large-scale labeled datasets. Moreover, they fail to utilize the cooking images, which constitute an important part of food recipes. In this paper, we build MM-ReS, the first large-scale dataset for cooking workflow construction, consisting of 9,850 recipes with human-labeled workflow graphs. Cooking steps are multi-modal, featuring both text instructions and cooking images. We then propose a neural encoder–decoder model that utilizes both visual and textual information to construct the cooking workflow, which achieved over 20% performance gain over existing hand-crafted baselines.
Keywords
cause-and-effect reasoning, cooking workflow, deep learning, food recipes, mm-res dataset, multi-modal fusion
Discipline
Databases and Information Systems | Graphics and Human Computer Interfaces
Research Areas
Intelligent Systems and Optimization
Publication
Proceedings of the 28th ACM International Conference on Multimedia, MM 2020, Seattle, October 12–16
First Page
1132
Last Page
1141
ISBN
9781450379885
Identifier
10.1145/3394171.3413765
Publisher
Association for Computing Machinery, Inc
City or Country
Virtual Conference
Citation
PAN, Liangming; CHEN, Jingjing; WU, Jianlong; LIU, Shaoteng; NGO, Chong-wah; KAN, Min-Yen; JIANG, Yugang; and CHUA, Tat-Seng.
Multi-modal cooking workflow construction for food recipes. (2020). Proceedings of the 28th ACM International Conference on Multimedia, MM 2020, Seattle, October 12–16. 1132-1141.
Available at: https://ink.library.smu.edu.sg/sis_research/6464
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Included in
Databases and Information Systems Commons, Graphics and Human Computer Interfaces Commons