Research Collection School Of Computing and Information Systems

VadCLIP: Adapting vision-language models for weakly supervised video anomaly detection

Peng WU
Xuerong ZHOU
Guansong PANG, Singapore Management UniversityFollow
Lingru ZHOU
Qingsen YAN
Peng WANG
Yanning ZHANG

Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

2-2024

Abstract

The recent contrastive language-image pre-training (CLIP) model has shown great success in a wide range of image-level tasks, revealing remarkable ability for learning powerful visual representations with rich semantics. An open and worthwhile problem is efficiently adapting such a strong model to the video domain and designing a robust video anomaly detector. In this work, we propose VadCLIP, a new paradigm for weakly supervised video anomaly detection (WSVAD) by leveraging the frozen CLIP model directly without any pre-training and fine-tuning process. Unlike current works that directly feed extracted features into the weakly supervised classifier for frame-level binary classification, VadCLIP makes full use of fine-grained associations between vision and language on the strength of CLIP and involves dual branch. One branch simply utilizes visual features for coarse-grained binary classification, while the other fully leverages the fine-grained language-image alignment. With the benefit of dual branch, VadCLIP achieves both coarse-grained and fine-grained video anomaly detection by transferring pre-trained knowledge from CLIP to WSVAD task. We conduct extensive experiments on two commonly-used benchmarks, demonstrating that VadCLIP achieves the best performance on both coarse-grained and fine-grained WSVAD, surpassing the state-of-the-art methods by a large margin. Specifically, VadCLIP achieves 84.51% AP and 88.02% AUC on XD-Violence and UCF-Crime, respectively. Code and features are released at https://github.com/nwpu-zxr/VadCLIP.

Keywords

CV: Video Understanding & Activity Analysis, CV: Image and Video Retrieval, CV: Language and Vision, CV: Multi-modal Vision, CV: Scene Analysis & Understanding

Discipline

Artificial Intelligence and Robotics | Graphics and Human Computer Interfaces

Research Areas

Intelligent Systems and Optimization

Areas of Excellence

Digital transformation

Publication

Proceedings of the 38th AAAI Conference on Artificial Intelligence, AAAI 2024, Vancouver, Canada, February 20-27

Volume

First Page

6074

Last Page

6082

Identifier

10.1609/AAAI.V38I6.28423

Publisher

{AAAI} Press

City or Country

Vancouver

Citation

WU, Peng; ZHOU, Xuerong; PANG, Guansong; ZHOU, Lingru; YAN, Qingsen; WANG, Peng; and ZHANG, Yanning. VadCLIP: Adapting vision-language models for weakly supervised video anomaly detection. (2024). Proceedings of the 38th AAAI Conference on Artificial Intelligence, AAAI 2024, Vancouver, Canada, February 20-27. 38, 6074-6082.
Available at: https://ink.library.smu.edu.sg/sis_research/9873

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1609/aaai.v38i6.28423

Download

Included in

Artificial Intelligence and Robotics Commons, Graphics and Human Computer Interfaces Commons

COinS

Research Collection School Of Computing and Information Systems

VadCLIP: Adapting vision-language models for weakly supervised video anomaly detection

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Areas of Excellence

Publication

Volume

First Page

Last Page

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

VadCLIP: Adapting vision-language models for weakly supervised video anomaly detection

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Areas of Excellence

Publication

Volume

First Page

Last Page

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links