Publication Type
Conference Proceeding Article
Version
acceptedVersion
Publication Date
9-2024
Abstract
Fuzz drivers are essential for library API fuzzing. However, automatically generating fuzz drivers is a complex task, as it demands the creation of high-quality, correct, and robust API usage code. An LLM-based (Large Language Model) approach for generating fuzz drivers is a promising area of research. Unlike traditional program analysis-based generators, this text-based approach is more generalized and capable of harnessing a variety of API usage information, resulting in code that is friendly for human readers. However, there is still a lack of understanding regarding the fundamental issues on this direction, such as its e ectiveness and potential challenges. To bridge this gap, we conducted the rst in-depth study targeting the important issues of using LLMs to generate e ective fuzz drivers. Our study features a curated dataset with 86 fuzz driver generation questions from 30widely-usedCprojects. Six prompting strategies are designed and tested across ve state-of-the-art LLMs with vedi erenttemperaturesettings.Intotal, ourstudyevaluated 736,430 generated fuzz drivers, with 0.85 billion token costs ($8,000+ charged tokens). Additionally, we compared the LLM-generated drivers against those utilized in industry, conducting extensive fuzzing experiments (3.75 CPU-year). Our study uncovered that: 1) While LLM-based fuzz driver generation is a promising direction, it still encounters several obstacles towards practical applications; 2) LLMs face di culties in generating e ective fuzz drivers for APIs with intricate speci cs. Three featured design choices of prompt strategies can be bene cial: issuing repeat queries, querying with examples, and employing an iterative querying process; 3) While LLM-generated drivers can yield fuzzing outcomes that are on par with those used in the industry, there are substantial opportunities for enhancement, such as extending contained API usage, or integrating semantic oracles to facilitate logical bug detection.
Keywords
Fuzz driver generation, Fuzz testing, Large language model
Discipline
Artificial Intelligence and Robotics | Software Engineering
Research Areas
Data Science and Engineering; Information Systems and Management
Publication
Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, Vienna, Austria, 2024 September 16–20
First Page
1223
Last Page
1225
ISBN
9798400706127
Identifier
10.1145/3650212.3680355
Publisher
ACM
City or Country
New York
Citation
ZHANG, Cen; ZHENG, Yaowen; BAI, Mingqiang; LI, Yeting; MA, Wei; and XIE, Xiaofei.
How effective are they? Exploring large language model based fuzz driver generation. (2024). Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, Vienna, Austria, 2024 September 16–20. 1223-1225.
Available at: https://ink.library.smu.edu.sg/sis_research/9508
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1145/3650212.3680355