Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

7-2024

Abstract

Communities on GitHub often use issue labels as a way of triaging issues by assigning them priority ratings based on how urgently they should be addressed. The labels used are determined by the repository contributors and notstandardisedbyGitHub.Thismakes it difficult for priority-related reasoning across repositories for both researchers and contributors. Previous work shows interest in how issues are labelled and what the consequences for those labels are. For instance, some previous work has used clustering models and natural language processing to categorise labels without a particular emphasis on priority. With this publication, we introduce a unique data set of 812 manually categorised labels pertaining to priority; normalised and ranked as low-, medium-, or high-priority. To provide an example of how this data set could be used, we have created a tool for GitHub contributors that will create a list of the highest priority issues from the repositories to which they contribute. We have released the data set and the tool for anyone to use on Zenodo because we hope that this will help the open source community address high-priority issues more effectively and inspire other uses.

Keywords

data sets, GitHub issues, task priority

Discipline

Software Engineering

Research Areas

Software and Cyber-Physical Systems

Publication

PROMISE 2024: Proceedings of the 20th International Conference on Predictive Models and Data Analytics in Software Engineering, Porto de Galinhas, Brazil, July 16

First Page

52

Last Page

55

ISBN

9798400706752

Identifier

10.1145/3663533.3664041

Publisher

ACM

City or Country

New York

Additional URL

https://doi.org/10.1145/3663533.3664041

Share

COinS