Attention-driven pseudo-label self-training for weakly supervised video anomaly detection

Publication Type

Journal Article

Publication Date

9-2026

Abstract

Recently, two-stage self-training methods based on generating pseudo-labels for weakly supervised video anomaly detection (WSVAD) have achieved notable progress. However, the generated pseudo-labels often suffer from incompleteness and noise, which hampers further performance gains. To achieve better pseudo-label generation and self-training performance, inspired by the human attention mechanism, we introduce a novel dual-branch framework for WSVAD that synchronizes pseudo-label generation and self-training. The first branch introduces a video snippet separation and fusion (VSSF) module based on self-attention and cross-attention mechanisms. A video classification module then follows the VSSF module to classify the fused video feature representations, thereby further enhancing the distinction between anomalous and normal snippets. Building on this, we design an attention-driven pseudo-label generation (PLG) module equipped with a denoising strategy. This module infers accurate and comprehensive snippet-level pseudo-labels from the separation process, guided by a compactness-separation loss and distributional dissimilarity loss. In the second branch, we design a multi-scale temporal feature interaction learning module, which captures rich temporal dependencies among video snippets to enhance their discriminability. Then, the second branch synchronously receives the latest pseudo-labels from the first branch for snippet classifier learning, which minimizes the impact of noisy snippets, thereby improving the self-training performance. Extensive experiments on three benchmark datasets demonstrate that our method consistently surpasses existing two- and multi-stage self-training frameworks and achieves competitive or superior results to recent one-stage approaches, highlighting the effectiveness of our proposed framework. Our code is available at https://github.com/Beyond-Zw/ADPLG-VAD.

Keywords

Attention mechanism, Pseudo-label generation, Self-training, Video anomaly detection, Weak supervision

Discipline

Graphics and Human Computer Interfaces

Research Areas

Intelligent Systems and Optimization

Publication

Pattern Recognition

Volume

177

ISSN

0031-3203

Identifier

10.1016/j.patcog.2026.113349

Publisher

Elsevier

Additional URL

https://doi.org/10.1016/j.patcog.2026.113349

This document is currently not available here.

Share

COinS