Publication Type

Conference Proceeding Article

Publication Date

10-2017

Abstract

Food is rich of visible (e.g., colour, shape) and procedural (e.g., cutting, cooking) attributes. Proper leveraging of these attributes, particularly the interplay among ingredients, cutting and cooking methods, for health-related applications has not been previously explored. This paper investigates cross-modal retrieval of recipes, specifically to retrieve a text-based recipe given a food picture as query. As similar ingredient composition can end up with wildly different dishes depending on the cooking and cutting procedures, the difficulty of retrieval originates from fine-grained recognition of rich attributes from pictures. With a multi-task deep learning model, this paper provides insights on the feasibility of predicting ingredient, cutting and cooking attributes for food recognition and recipe retrieval. In addition, localization of ingredient regions is also possible even when region-level training examples are not provided. Experiment results validate the merit of rich attributes when comparing to the recently proposed ingredient-only retrieval techniques.

Keywords

Cooking and cutting recognition, Cross-modal retrieval, Ingredient recognition, Recipe retrieval

Discipline

Databases and Information Systems | Data Storage Systems | Graphics and Human Computer Interfaces

Research Areas

Intelligent Systems and Optimization

Publication

Proceedings of the 25th ACM International Conference on Multimedia, MM 2017, Mountain View, California, October 23–27

First Page

1771

Last Page

1779

ISBN

9781450349062

Identifier

10.1145/3123266.3123428

Publisher

Association for Computing Machinery, Inc

City or Country

Mountain View

Share

COinS