Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

5-2019

Abstract

Stack Overflow (SO) is the most popular questionand-answer website for software developers, providing a large amount of copyable code snippets. Like other software artifacts, code on SO evolves over time, for example when bugs are fixed or APIs are updated to the most recent version. To be able to analyze how code and the surrounding text on SO evolves, we built SOTorrent, an open dataset based on the official SO data dump. SOTorrent provides access to the version history of SO content at the level of whole posts and individual text and code blocks. It connects code snippets from SO posts to other platforms by aggregating URLs from surrounding text blocks and comments, and by collecting references from GitHub files to SO posts. Our vision is that researchers will use SOTorrent to investigate and understand the evolution and maintenance of code on SO and its relation to other platforms such as GitHub.

Keywords

Code snippets, Github, Open dataset, Software evolution, Stack overflow

Discipline

Programming Languages and Compilers | Software Engineering

Research Areas

Software and Cyber-Physical Systems

Publication

Proceedings of the 16th International Conference on Mining Software Repositories, Montreal, Canada, 2019 May 26-27

First Page

191

Last Page

194

ISBN

9781728134123

Identifier

10.1109/MSR.2019.00038

Publisher

IEEE Computer Society

City or Country

Piscataway, NJ

Additional URL

https://doi.org/10.1109/MSR.2019.00038

Share

COinS