Publication Type

Journal Article

Version

acceptedVersion

Publication Date

3-2026

Abstract

Neural networks (NNs) have rapidly advanced, demonstrating exceptional performance across various fields, leading to a surge in open-source NN projects. The complexity and rapid growth of these projects pose significant challenges for maintenance within the open-source community. Given that NN architecture code is the core asset of NN projects, understanding its reuse in the open-source community is essential for effective maintenance, such as reducing redundancy and identifying potential intellectual property violations. While prior studies have examined code reuse in open-source projects, they have two key limitations: They do not specifically address NN structure code, and they rely on manually selected small-scale datasets that do not sufficiently represent the broader open-source ecosystem. To address these limitations, this study explores reuse patterns in a large-scale dataset of 20,000 open-source projects on GitHub, focusing specifically on NN structure code. Specially, we categorize NN structure reuse into three types: (1) exact reuse with no changes; (2) shallow reuse with minor adjustments like renaming variables or adjusting parameters; and (3) conceptual reuse with significant modifications, while retaining the same layer sequence. We then propose a detection framework, NNReuse, to identify these reuse types and conduct an empirical evaluation of their prevalence and characteristics. As a practical application, we also assess potential license conflicts based on NNReuse. Extensive experiments show that 55.6% of projects and 54.17% of NN structures exhibit structural similarities that are consistent with potential NN structure reuse in open-source projects. Among these, exact reuse is particularly common and introduces significant redundancy, with an estimated storage optimization potential of up to 34.49%. Reuse primarily occurs at a high level, with 43.3% involving the reuse of overall network architecture. Additionally, in projects with license protection, as much as 64.3% may present potential license conflicts, highlighting the importance of strengthened license compliance and proactive IP risk mitigation in the open-source community.

Keywords

code reuse, neural network, open-source community maintenance

Discipline

Software Engineering

Publication

Journal of Software: Evolution and Process

Volume

38

Issue

3

First Page

1

Last Page

17

ISSN

2047-7473

Identifier

10.1002/smr.70090

Publisher

Wiley

Copyright Owner and License

Authors

Additional URL

https://doi.org/10.1002/smr.70090

Share

COinS