Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

5-2023

Abstract

Although information theory has found success in disciplines, the literature on its applications to software evolution is limit. We are still missing artifacts that leverage the data and tooling available to measure how the information content of a project can be a proxy for its complexity. In this work, we explore two definitions of entropy, one structural and one textual, and apply it to the historical progression of the commit history of 25 open source projects. We produce evidence that they generally are highly correlated. We also observed that they display weak and unstable correlations with other complexity metrics. Our preliminary investigation of outliers shows an unexpected high frequency of events where there is considerable change in the information content of the project, suggesting that such outliers may inform a definition of surprisal.

Keywords

entropy, Information theory, software engineering

Discipline

Software Engineering

Research Areas

Software and Cyber-Physical Systems

Publication

Proceedings of the 2nd Workshop on Natural Language-based Software Engineering, 2023 May 20

First Page

48

Last Page

55

ISBN

9798350301786

Identifier

10.1109/NLBSE59153.2023.00017

Publisher

IEEE

City or Country

Los Alamitos, CA

Copyright Owner and License

Authors

Share

COinS