Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

6-2024

Abstract

Multi-view representation learning aims to derive robust representations that are both view-consistent and view-specific from diverse data sources. This paper presents an in-depth analysis of existing approaches in this domain, highlighting a commonly overlooked aspect: the redundancy between view-consistent and view-specific representations. To this end, we propose an innovative framework for multi-view representation learning, which incorporates a technique we term 'distilled disentangling'. Our method introduces the concept of masked cross-view prediction, enabling the extraction of compact, high-quality view-consistent representations from various sources without incurring extra computational overhead. Additionally, we develop a distilled disentangling module that efficiently filters out consistency-related information from multi-view representations, resulting in purer view-specific representations. This approach significantly reduces redundancy between view-consistent and view-specific representations, enhancing the overall efficiency of the learning process. Our empirical evaluations reveal that higher mask ratios substantially improve the quality of view-consistent representations. Moreover, we find that reducing the dimensionality of view-consistent representations relative to that of view-specific representations further refines the quality of the combined representations.

Keywords

Representation learning, Computer vision, Filters, Codes, Soft sensors, Redundancy, Pattern recognition, Multi-view representation learning

Discipline

Artificial Intelligence and Robotics | Graphics and Human Computer Interfaces

Research Areas

Intelligent Systems and Optimization

Publication

Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024) : Seattle, WA, USA, June 16-22

First Page

26774

Last Page

26783

Identifier

10.1109/CVPR52733.2024.02528

Publisher

IEEE

City or Country

Seattle, USA

Citation

KE, Guanzhou; WANG, Bo; WANG, Xiaoli; and HE, Shengfeng. Rethinking multi-view representation learning via distilled disentangling. (2024). Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024) : Seattle, WA, USA, June 16-22. 26774-26783.
Available at: https://ink.library.smu.edu.sg/sis_research/9777

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1109/CVPR52733.2024.02528

Download

Included in

Artificial Intelligence and Robotics Commons, Graphics and Human Computer Interfaces Commons

COinS

Research Collection School Of Computing and Information Systems

Rethinking multi-view representation learning via distilled disentangling

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Rethinking multi-view representation learning via distilled disentangling

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links