Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

5-2020

Abstract

Software developers use a mix of source code and natural language text to communicate with each other: Stack Overflow and Developer mailing lists abound with this mixed text. Tagging this mixed text is essential for making progress on two seminal software engineering problems — traceability, and reuse via precise extraction of code snippets from mixed text. In this paper, we borrow code-switching techniques from Natural Language Processing and adapt them to apply to mixed text to solve two problems: language identification and token tagging. Our technique, POSIT, simultaneously provides abstract syntax tree tags for source code tokens, part-of-speech tags for natural language words, and predicts the source language of a token in mixed text. To realize POSIT, we trained a biLSTM network with a Conditional Random Field output layer using abstract syntax tree tags from the CLANG compiler and part-of-speech tags from the Standard Stanford part-of-speech tagger. POSIT improves the state-of-the-art on language identification by 10.6% and PoS/AST tagging by 23.7% in accuracy

Keywords

Code-switching, Language identification, Mixed-code, Part-of-speech tagging

Discipline

Programming Languages and Compilers | Software Engineering

Research Areas

Software and Cyber-Physical Systems

Publication

Proceedings of the 42nd International Conference on Software Engineering, Seoul, South Korea, 2020, May 23-29

First Page

1348

Last Page

1358

ISBN

9781450371216

Identifier

10.1145/3377811.3380440

Publisher

ACM

City or Country

New York

Citation

PÂRȚACHI, Profir-Petru; DASH, Santanu; TREUDE, Christoph; and BARR, Earl T.. POSIT: Simultaneously tagging natural and programming languages. (2020). Proceedings of the 42nd International Conference on Software Engineering, Seoul, South Korea, 2020, May 23-29. 1348-1358.
Available at: https://ink.library.smu.edu.sg/sis_research/8907

Copyright Owner and License

Authors

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1145/3377811.3380440

Download

Included in

Programming Languages and Compilers Commons, Software Engineering Commons

COinS

Research Collection School Of Computing and Information Systems

POSIT: Simultaneously tagging natural and programming languages

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

POSIT: Simultaneously tagging natural and programming languages

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links