Research Collection School Of Computing and Information Systems

DeepSonar: Towards effective and robust detection of AI-synthesized fake voices

Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

10-2020

Abstract

With the recent advances in voice synthesis, AI-synthesized fake voices are indistinguishable to human ears and widely are applied to produce realistic and natural DeepFakes, exhibiting real threats to our society. However, effective and robust detectors for synthesized fake voices are still in their infancy and are not ready to fully tackle this emerging threat. In this paper, we devise a novel approach, named DeepSonar, based on monitoring neuron behaviors of speaker recognition (SR) system, i.e., a deep neural network (DNN), to discern AI-synthesized fake voices. Layer-wise neuron behaviors provide an important insight to meticulously catch the differences among inputs, which are widely employed for building safety, robust, and interpretable DNNs. In this work, we leverage the power of layer-wise neuron activation patterns with a conjecture that they can capture the subtle differences between real and AI-synthesized fake voices, in providing a cleaner signal to classifiers than raw inputs. Experiments are conducted on three datasets (including commercial products from Google, Baidu, etc) containing both English and Chinese languages to corroborate the high detection rates (98.1% average accuracy) and low false alarm rates (about 2% error rate) of DeepSonar in discerning fake voices. Furthermore, extensive experimental results also demonstrate its robustness against manipulation attacks (e.g., voice conversion and additive real-world noises). Our work further poses a new insight into adopting neuron behaviors for effective and robust AI aided multimedia fakes forensics as an inside-out approach instead of being motivated and swayed by various artifacts introduced in synthesizing fakes.

Keywords

DeepFake, fake voice, neuron behavior

Discipline

OS and Networks | Software Engineering

Research Areas

Software and Cyber-Physical Systems

Publication

Proceedings of the 28th ACM International Conference on Multimedia, MM 2020, Seattle, October 12–16

First Page

1207

Last Page

1216

ISBN

9781450379885

Identifier

10.1145/3394171.3413716

Publisher

Association for Computing Machinery

City or Country

Virtual Conference

Citation

WANG, Run; JUEFEI-XU, Felix; HUANG, Yihao; GUO, Qing; XIE, Xiaofei; MA, Lei; and LIU, Yang. DeepSonar: Towards effective and robust detection of AI-synthesized fake voices. (2020). Proceedings of the 28th ACM International Conference on Multimedia, MM 2020, Seattle, October 12–16. 1207-1216.
Available at: https://ink.library.smu.edu.sg/sis_research/7082

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Download

Included in

OS and Networks Commons, Software Engineering Commons

COinS

Research Collection School Of Computing and Information Systems

DeepSonar: Towards effective and robust detection of AI-synthesized fake voices

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

DeepSonar: Towards effective and robust detection of AI-synthesized fake voices

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Included in

Share

Search

Links

Browse

Links