Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

10-2024

Abstract

Artificial Intelligence Generated Content (AIGC) has garnered considerable attention for its impressive performance, with Large Language Models (LLMs), like ChatGPT, emerging as a leading AIGC model that produces high-quality responses across various applications, including software development and maintenance. Despite its potential, the misuse of LLMs, especially in security and safetycritical domains, such as academic integrity and answering questions on Stack Overflow, poses significant concerns. Numerous AIGC detectors have been developed and evaluated on natural language data. However, their performance on code-related content generated by LLMs remains unexplored. To fill this gap, in this paper, we present an empirical study evaluating existing AIGC detectors in the software domain. We select three state-of-the-art LLMs, i.e., GPT-3.5, WizardCoder and CodeLlama, for machine-content generation. We further created a comprehensive dataset including 2.23M samples comprising coderelated content for each model, encompassing popular software activities like Q&A (150K), code summarization (1M), and code generation (1.1M). We evaluated thirteen AIGC detectors, comprising six commercial and seven open-source solutions, assessing their performance on this dataset. Our results indicate that AIGC detectors perform less on code-related data than natural language data. Fine-tuning can enhance detector performance, especially for content within the same domain; but generalization remains a challenge.

Keywords

AIGC Detection, Code Generation, Large Language Model

Discipline

Artificial Intelligence and Robotics

Research Areas

Intelligent Systems and Optimization

Publication

Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE 2024) : Sacramento CA, USA, October 27 - November 1

First Page

844

Last Page

856

Identifier

10.1145/3691620.3695468

Publisher

Association for Computing Machinery

City or Country

USA

Citation

WANG, Jian; LIU, Shangqing; XIE, Xiaofei; and LI, Yi. An empirical study to evaluate AIGC detectors on code content. (2024). Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE 2024) : Sacramento CA, USA, October 27 - November 1. 844-856.
Available at: https://ink.library.smu.edu.sg/sis_research/9724

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1145/3691620.3695468

Download

Included in

Artificial Intelligence and Robotics Commons

COinS

Research Collection School Of Computing and Information Systems

An empirical study to evaluate AIGC detectors on code content

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

An empirical study to evaluate AIGC detectors on code content

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links