Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

5-2022

Abstract

Background. From information theory, surprisal is a measurement of how unexpected an event is. Statistical language models provide a probabilistic approximation of natural languages, and because surprisal is constructed with the probability of an event occuring, it is therefore possible to determine the surprisal associated with English sentences. The issues and pull requests of software repository issue trackers give insight into the development process and likely contain the surprising events of this process. Objective. Prior works have identified that unusual events in software repositories are of interest to developers, and use simple code metrics-based methods for detecting them. In this study we will propose a new method for unusual event detection in software repositories using surprisal. With the ability to find surprising issues and pull requests, we intend to further analyse them to determine if they actually hold importance in a repository, or if they pose a significant challenge to address. If it is possible to find bad surprises early, or before they cause additional troubles, it is plausible that effort, cost and time will be saved as a result.

Keywords

self-information, n-gram, GitHub issues

Discipline

Software Engineering

Research Areas

Intelligent Systems and Optimization

Areas of Excellence

Digital transformation

Publication

MSR '22: 19th International Conference on Mining Software Repositories, Pittsburgh, Pennsylvania, May 23-24

First Page

1

Last Page

8

Publisher

ACM

City or Country

New York

Share

COinS