Publication Type

Journal Article

Version

acceptedVersion

Publication Date

2-2025

Abstract

Context: Software development creates and relies on a large volume of information, yet the volume of this information can make it challenging for developers to maintain an overview of all goings-on that a team and external actors contribute to a project. We posit that unexpected or “surprising” events could serve as important signposts amidst this information overload. These unexpected events may indicate underlying anomalies or emergent situations that require immediate attention. To explore this premise, our study leverages the concept of ‘surprisal’ from information theory to identify and quantify these unusual occurrences from the issues and pull requests of popular open-source software repositories. Objective: Drawing from a previously published research protocol, our study investigates whether a correlation exists between the ‘surprisal’ of issues and their perceived importance or difficulty within software repositories. Results: We performed a comprehensive analysis of approximately two million issues and pull requests, gathered from 1,270 repositories. Their ‘surprisal’ was then examined in relation to several indicative metrics of difficulty and perceived importance. Our results indicate only a weak correlation. This outcome underscores the need for further research to devise more effective strategies for helping developers prioritise issues.

Keywords

GitHub issues, n-gram, Self-information

Discipline

Software Engineering

Research Areas

Software and Cyber-Physical Systems

Publication

Empirical Software Engineering

Volume

30

Issue

1

First Page

1

Last Page

34

ISSN

1382-3256

Identifier

10.1007/s10664-024-10587-w

Publisher

Springer

Copyright Owner and License

Authors

Additional URL

https://doi.org/10.1007/s10664-024-10587-w

Share

COinS