Research Collection School Of Computing and Information Systems

OVFoodSeg : Elevating open-vocabulary food image segmentation via image-informed textual representation

Publication Type

Conference Proceeding Article

Publication Date

1-2024

Abstract

In the realm of food computing, segmenting ingredients from images poses substantial challenges due to the large intra-class variance among the same ingredients, the emergence of new ingredients, and the high annotation costs as-sociated with large food segmentation datasets. Existing approaches primarily utilize a closed-vocabulary and static text embeddings setting. These methods often fall short in effectively handling the ingredients, particularly new and diverse ones. In response to these limitations, we introduce OVFoodSeg, a framework that adopts an open-vocabulary setting and enhances text embeddings with visual context. By integrating vision-language models (VLMs), our approach enriches text embedding with image-specific infor-mation through two innovative modules, e.g., an image-to-text learner FoodLearner and an Image-Informed Text Encoder. The training process of OVFoodSeg is divided into two stages: the pre-training of FoodLearner and the sub-sequent learning phase for segmentation. The pre-training phase equips FoodLearner with the capability to align visual information with corresponding textual representations that are specifically related to food, while the second phase adapts both the FoodLearner and the Image-Informed Text Encoder for the segmentation task. By addressing the de-ficiencies of previous models, OVFoodSeg demonstrates a significant improvement, achieving an 4.9% increase in mean Intersection over Union (mIoU) on the FoodSeg103 dataset, setting a new milestone for food image segmentation.

Keywords

Food image segmentation, Text embeddings, Vision language model, Image segmentation, Visualization, Computer vision, Adaptation models, Machine learning

Discipline

Artificial Intelligence and Robotics | Computer Sciences

Research Areas

Intelligent Systems and Optimization; Data Science and Engineering

Publication

Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024) : Seattle, WA, USA, June 16-22

First Page

4144

Last Page

4153

ISBN

9798350353006

Identifier

10.1109/CVPR52733.2024.00397

Publisher

IEEE Computer Society

City or Country

Seattle, USA

Citation

WU, Xiongwei; YU, Sicheng; LIM, Ee-Peng; and NGO, Chong-wah. OVFoodSeg : Elevating open-vocabulary food image segmentation via image-informed textual representation. (2024). Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024) : Seattle, WA, USA, June 16-22. 4144-4153.
Available at: https://ink.library.smu.edu.sg/sis_research/9861

Additional URL

https://doi.org/10.1109/CVPR52733.2024.00397

This document is currently not available here.

COinS

Research Collection School Of Computing and Information Systems

OVFoodSeg : Elevating open-vocabulary food image segmentation via image-informed textual representation

Publication Type

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Additional URL

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

OVFoodSeg : Elevating open-vocabulary food image segmentation via image-informed textual representation

Author

Publication Type

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Additional URL

Share

Search

Links

Browse

Links