Research Collection School Of Computing and Information Systems

Divide-and-Conquer: Confluent Triple-Flow Network for RGB-T salient object detection

Hao TANG, Nanjing University of Science and Technology
Zechao LI, Nanjing University of Science and Technology
Dong ZHANG, Hong Kong University of Science and Technology
Shengfeng HE, Singapore Management UniversityFollow
Jinhui TANG, Nanjing University of Science and Technology

Publication Type

Journal Article

Version

acceptedVersion

Publication Date

3-2025

Abstract

RGB-Thermal Salient Object Detection (RGB-T SOD) aims to pinpoint prominent objects within aligned pairs of visible and thermal infrared images. A key challenge lies in bridging the inherent disparities between RGB and Thermal modalities for effective saliency map prediction. Traditional encoder-decoder architectures, while designed for cross-modality feature interactions, may not have adequately considered the robustness against noise originating from defective modalities, thereby leading to suboptimal performance in complex scenarios. Inspired by hierarchical human visual systems, we propose the ConTriNet, a robust Confluent Triple-Flow Network employing a "Divide-and-Conquer"strategy. This framework utilizes a unified encoder with specialized decoders, each addressing different subtasks of exploring modality-specific and modality-complementary information for RGB-T SOD, thereby enhancing the final saliency map prediction. Specifically, ConTriNet comprises three flows: two modality-specific flows explore cues from RGB and Thermal modalities, and a third modality-complementary flow integrates cues from both modalities. ConTriNet presents several notable advantages. It incorporates a Modality-induced Feature Modulator (MFM) in the modality-shared union encoder to minimize inter-modality discrepancies and mitigate the impact of defective samples. Additionally, a foundational Residual Atrous Spatial Pyramid Module (RASPM) in the separated flows enlarges the receptive field, allowing for the capture of multi-scale contextual information. Furthermore, a Modality-aware Dynamic Aggregation Module (MDAM) in the modality-complementary flow dynamically aggregates saliency-related cues from both modality-specific flows. Leveraging the proposed parallel triple-flow framework, we further refine saliency maps derived from different flows through a flow-cooperative fusion strategy, yielding a high-quality, full-resolution saliency map for the final prediction. To evaluate the robustness and stability of our approach, we collect a comprehensive RGB-T SOD benchmark, VT-IMAG, covering various real-world challenging scenarios. Extensive experiments on public benchmarks and our VT-IMAG dataset demonstrate that ConTriNet consistently outperforms state-of-the-art competitors in both common and challenging scenarios, even when dealing with incomplete modality data.

Keywords

Encoder-Decoder, multi-modal fusion, RGB-thermal, salient object detection

Discipline

Artificial Intelligence and Robotics | Numerical Analysis and Scientific Computing

Research Areas

Intelligent Systems and Optimization

Areas of Excellence

Digital transformation

Publication

IEEE Transactions on Pattern Analysis and Machine Intelligence

Volume

Issue

First Page

1958

Last Page

1974

ISSN

0162-8828

Identifier

10.1109/TPAMI.2024.3511621

Publisher

Institute of Electrical and Electronics Engineers

Citation

TANG, Hao; LI, Zechao; ZHANG, Dong; HE, Shengfeng; and TANG, Jinhui. Divide-and-Conquer: Confluent Triple-Flow Network for RGB-T salient object detection. (2025). IEEE Transactions on Pattern Analysis and Machine Intelligence. 47, (3), 1958-1974.
Available at: https://ink.library.smu.edu.sg/sis_research/9905

Copyright Owner and License

Authors

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1109/TPAMI.2024.3511621

Download

Find it in your library

Included in

Artificial Intelligence and Robotics Commons, Numerical Analysis and Scientific Computing Commons

COinS

Research Collection School Of Computing and Information Systems

Divide-and-Conquer: Confluent Triple-Flow Network for RGB-T salient object detection

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Areas of Excellence

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Divide-and-Conquer: Confluent Triple-Flow Network for RGB-T salient object detection

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Areas of Excellence

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links