Cross-level requirements tracing based on large language models

Publication Type

Journal Article

Publication Date

7-2025

Abstract

Cross-level requirements traceability, linking high-level requirements (HLRs) and low-level requirements (LLRs), is essential for maintaining relationships and consistency in software development. However, the manual creation of requirements links necessitates a profound understanding of the project and entails a complex and laborious process. Existing machine learning and deep learning methods often fail to fully understand semantic information, leading to low accuracy and unstable performance. This paper presents the first approach for cross-level requirements tracing based on large language models (LLMs) and introduces a data augmentation strategy (such as synonym replacement, machine translation, and noise introduction) to enhance model robustness. We compare three fine-tuning strategies—LoRA, P-Tuning, and Prompt-Tuning—on different scales of LLaMA models (1.1B, 7B, and 13B). The fine-tuned LLMs exhibit superior performance across various datasets, including six single-project datasets, three cross-project datasets within the same domain, and one cross-domain dataset. Experimental results show that fine-tuned LLMs outperform traditional information retrieval, machine learning, and deep learning methods on various datasets. Furthermore, we compare the performance of GPT and DeepSeek LLMs under different prompt templates, revealing their high sensitivity to prompt design and relatively poor result stability. Our approach achieves superior performance, outperforming GPT-4o and DeepSeek-r1 by 16.27% and 16.8% in F-measure on cross-domain datasets. Compared to the baseline method that relies on prompt engineering, it achieves a maximum improvement of 13.8%.

Keywords

Feature Extraction, Semantics, Deep Learning, Information Retrieval, Data Augmentation, Software, Vectors, Training, Large Language Models, Accuracy, Requirements Tracing, Large Language Models, Fine Tuning, Data Augmentation, Software Requirements, Language Model, Large Language Models, Machine Learning, Deep Learning, Superior Performance, Machine Learning Methods, Data Augmentation, Poor Stability, Information Retrieval, Baseline Methods, Traditional Machine Learning, Machine Translation, Traditional Machine Learning Methods, Fine Tuning Strategy, Model Performance, Positive Samples, Machine Learning Models, Deep Learning Models, Precision And Recall, Words In Sentences, Data Augmentation Techniques, High Recall, Original Text, Recall Rate, Vector Space Model, CNN Model, Video Summarization, Jensen Shannon Divergence, Fine Tuning Method, Latent Dirichlet Allocation

Discipline

Artificial Intelligence and Robotics

Research Areas

Intelligent Systems and Optimization

Publication

IEEE Transactions on Software Engineering

Volume

51

Issue

7

First Page

2044

Last Page

2066

ISSN

0098-5589

Identifier

10.1109/TSE.2025.3572094

Publisher

Institute of Electrical and Electronics Engineers

This document is currently not available here.

Share

COinS