Research Collection School Of Computing and Information Systems

TranSiam: Aggregating multi-modal visual features with locality for medical image segmentation

Xuejian LI, Central South University
Shiqiang MA, Chinese Academy of Sciences
Junhai XU, Tianjin University
Jijun TANG, Chinese Academy of Sciences
Shengfeng HE, Singapore Management UniversityFollow
Fei GUO, Central South University

Publication Type

Journal Article

Version

acceptedVersion

Publication Date

3-2024

Abstract

Automatic segmentation of medical images plays an important role in the diagnosis of diseases. On single-modal data, convolutional neural networks have demonstrated satisfactory performance. However, multi-modal data encompasses a greater amount of information rather than single-modal data. Multi-modal data can be effectively used to improve the segmentation accuracy of regions of interest by analyzing both spatial and temporal information. In this study, we propose a dual-path segmentation model for multi-modal medical images, named TranSiam. Taking into account that there is a significant diversity between the different modalities, TranSiam employs two parallel CNNs to extract the features which are specific to each of the modalities. In our method, two parallel CNNs extract detailed and local information in the low-level layer, and the Transformer layer extracts global information in the high-level layer. Finally, we fuse the features of different modalities via a locality-aware aggregation block (LAA block) to establish the association between different modal features. The LAA block is used to locate the region of interest and suppress the influence of invalid regions on multi-modal feature fusion. TranSiam uses LAA blocks at each layer of the encoder in order to fully fuse multi-modal information at different scales. Extensive experiments on several multi-modal datasets have shown that TranSiam achieves satisfying results.

Keywords

Feature-level fusion, Local attention mechanism, Medical image segmentation, Multi-modal fusion

Discipline

Graphics and Human Computer Interfaces | Health Information Technology

Research Areas

Data Science and Engineering

Publication

Expert Systems with Applications

Volume

237

First Page

Last Page

ISSN

0957-4174

Identifier

10.1016/j.eswa.2023.121574

Publisher

Elsevier

Citation

LI, Xuejian; MA, Shiqiang; XU, Junhai; TANG, Jijun; HE, Shengfeng; and GUO, Fei. TranSiam: Aggregating multi-modal visual features with locality for medical image segmentation. (2024). Expert Systems with Applications. 237, 1-11.
Available at: https://ink.library.smu.edu.sg/sis_research/8222

Copyright Owner and License

Authors

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1016/j.eswa.2023.121574

Download

Find it in your library

Included in

Graphics and Human Computer Interfaces Commons, Health Information Technology Commons

COinS

Research Collection School Of Computing and Information Systems

TranSiam: Aggregating multi-modal visual features with locality for medical image segmentation

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

TranSiam: Aggregating multi-modal visual features with locality for medical image segmentation

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links