Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

11-2023

Abstract

Multi-view (or -modality) representation learning aims to understand the relationships between different view representations. Existing methods disentangle multi-view representations into consistent and view-specific representations by introducing strong inductive biases, which can limit their generalization ability. In this paper, we propose a novel multi-view representation disentangling method that aims to go beyond inductive biases, ensuring both interpretability and generalizability of the resulting representations. Our method is based on the observation that discovering multi-view consistency in advance can determine the disentangling information boundary, leading to a decoupled learning objective. We also found that the consistency can be easily extracted by maximizing the transformation invariance and clustering consistency between views. These observations drive us to propose a two-stage framework. In the first stage, we obtain multi-view consistency by training a consistent encoder to produce semantically-consistent representations across views as well as their corresponding pseudo-labels. In the second stage, we disentangle specificity from comprehensive representations by minimizing the upper bound of mutual information between consistent and comprehensive representations. Finally, we reconstruct the original data by concatenating pseudo-labels and view-specific representations. Our experiments on four multi-view datasets demonstrate that our proposed method outperforms 12 comparison methods in terms of clustering and classification performance. The visualization results also show that the extracted consistency and specificity are compact and interpretable.

Keywords

Multi-view representation learning, Disentangled representation, Consistency and specificity

Discipline

Databases and Information Systems | Graphics and Human Computer Interfaces

Research Areas

Information Systems and Management

Publication

MM '23: Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, October 29 - November 3

First Page

2582

Last Page

2590

ISBN

9798400701085

Identifier

10.1145/3581783.3611794

Publisher

ACM

City or Country

New York

Citation

KE, Guanzhou; YU, Yang; CHAO, Guoqing; WANG, Xiaoli; XU, Chenyang; and HE, Shengfeng. Disentangling multi-view representations beyond inductive bias. (2023). MM '23: Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, October 29 - November 3. 2582-2590.
Available at: https://ink.library.smu.edu.sg/sis_research/8420

Copyright Owner and License

Authors

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1145/3581783.3611794

Download

Included in

Databases and Information Systems Commons, Graphics and Human Computer Interfaces Commons

COinS

Research Collection School Of Computing and Information Systems

Disentangling multi-view representations beyond inductive bias

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Disentangling multi-view representations beyond inductive bias

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links