Research Collection School Of Computing and Information Systems

Cross-modal recipe retrieval with stacked attention model

Publication Type

Journal Article

Version

publishedVersion

Publication Date

11-2018

Abstract

Taking a picture of delicious food and sharing it in social media has been a popular trend. The ability to recommend recipes along will benefit users who want to cook a particular dish, and the feature is yet to be available. The challenge of recipe retrieval, nevertheless, comes from two aspects. First, the current technology in food recognition can only scale up to few hundreds of categories, which are yet to be practical for recognizing tens of thousands of food categories. Second, even one food category can have variants of recipes that differ in ingredient composition. Finding the best-match recipe requires knowledge of ingredients, which is a fine-grained recognition problem. In this paper, we consider the problem from the viewpoint of cross-modality analysis. Given a large number of image and recipe pairs acquired from the Internet, a joint space is learnt to locally capture the ingredient correspondence between images and recipes. As learning happens at the regional level for image and ingredient level for recipe, the model has the ability to generalize recognition to unseen food categories. Furthermore, the embedded multi-modal ingredient feature sheds light on the retrieval of best-match recipes. On an in-house dataset, our model can double the retrieval performance of DeViSE, a popular cross-modality model but not considering region information during learning.

Keywords

Recipe retrieval, Cross-modal retrieval, Multi-modality embedding

Discipline

Computer Sciences | Graphics and Human Computer Interfaces

Research Areas

Intelligent Systems and Optimization

Publication

Multimedia Tools and Applications

Volume

Issue

First Page

29457

Last Page

29473

ISSN

1380-7501

Identifier

10.1007/s11042-018-5964-y

Publisher

Springer (part of Springer Nature): Springer Open Choice Hybrid Journals

Citation

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Download

Included in

Graphics and Human Computer Interfaces Commons

COinS

Research Collection School Of Computing and Information Systems

Cross-modal recipe retrieval with stacked attention model

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Creative Commons License

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Cross-modal recipe retrieval with stacked attention model

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Creative Commons License

Included in

Share

Search

Links

Browse

Links