Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

7-2023

Abstract

Neural networks for visual content understanding have recently evolved from convolutional ones to transformers. The prior (CNN) relies on small-windowed kernels to capture the regional clues, demonstrating solid local expressiveness. On the contrary, the latter (transformer) establishes long-range global connections between localities for holistic learning. Inspired by this complementary nature, there is a growing interest in designing hybrid models which utilize both techniques. Current hybrids merely replace convolutions as simple approximations of linear projection or juxtapose a convolution branch with attention without considering the importance of local/global modeling. To tackle this, we propose a new hybrid named Adaptive Split-Fusion Transformer (ASF-former) that treats convolutional and attention branches differently with adaptive weights. Specifically, an ASF-former encoder equally splits feature channels into half to fit dual-path inputs. Then, the outputs of the dual-path are fused with weights calculated from visual cues. We also design a compact convolutional path from a concern of efficiency. Extensive experiments on standard benchmarks show that our ASF-former outperforms its CNN, transformer, and hybrid counterparts in terms of accuracy (83.9% on ImageNet-1K), under similar conditions (12.9G MACs / 56.7M Params, without large-scale pre-training). The code is available at: https://github.com/szx503045266/ASF-former.

Keywords

CNN, Gating; Hybrid; Transformer; Visual understanding

Discipline

Databases and Information Systems

Research Areas

Data Science and Engineering

Publication

Proceedings of the 2023 IEEE International Conference on Multimedia and Expo, Brisbane, Australia, July 10-14

First Page

1169

Last Page

1174

ISBN

9781665468916

Identifier

10.1109/ICME55011.2023.00204

Publisher

IEEE

City or Country

New Jersey

Additional URL

https://doi.org/10.1109/ICME55011.2023.00204

Share

COinS