Single-frame supervision for temporal video anomaly grounding

Publication Type

Journal Article

Publication Date

3-2026

Abstract

Conventional video anomaly detection approaches struggle with increasingly sophisticated fine-grained analysis requirements in real-world applications, establishing Temporal Video Anomaly Grounding (TVAG) as one of the pivotal research frontiers in advanced anomaly video comprehension systems. Targeting the scarcity of precise temporal annotations, this work develops a single-frame supervision-based framework, Glance-guided Cross-modal Proposal Generation (GCPG), which offers competitive grounding performance, surpassing some fully supervised methods under specific metrics, while substantially reducing annotation costs. The framework consists of a Cross-Modal Collaborative Pseudo-Glance Localization module (PGL) and a Glance-Guided Gaussian Proposal Optimization module (GPO). PGL employs a semantic-aware dual-branch mechanism that jointly performs cross-modal feature fusion classification and textual semantic verification to generate reliable pseudo-frame supervision, forming the foundation for cross-modal alignment learning. GPO enhances proposal quality by reconstructing Gaussian mask composition weights based on glance-keyword alignment and distributional consistency. Comprehensive experiments and ablation analyses on two challenging TVAG benchmarks validate the efficacy of our single-frame supervised approach.

Keywords

Cross-modal fusion, Video anomaly detection, Video anomaly grounding

Discipline

Graphics and Human Computer Interfaces

Research Areas

Intelligent Systems and Optimization

Publication

Neurocomputing

Volume

668

Issue

1

First Page

1

Last Page

15

ISSN

0925-2312

Identifier

10.1016/j.neucom.2025.132346

Publisher

Elsevier

Additional URL

https://doi.org/10.1016/j.neucom.2025.132346

This document is currently not available here.

Share

COinS