Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
2-2018
Abstract
Towards the vision of translating code that implements an algorithm from one programming language into another, this paper proposes an approach for automated program classification using bilateral tree-based convolutional neural networks (BiTBCNNs). It is layered on top of two tree-based convolutional neural networks (TBCNNs), each of which recognizes the algorithm of code written in an individual programming language. The combination layer of the networks recognizes the similarities and differences among code in different programming languages. The BiTBCNNs are trained using the source code in different languages but known to implement the same algorithms and/or functionalities. For a preliminary evaluation, we use 3591 Java and 3534 C++ code snippets from 6 algorithms we crawled systematically from GitHub. We obtained over 90% accuracy in the cross-language binary classification task to tell whether any given two code snippets implement the same algorithm. Also, for the algorithm classification task, i.e., to predict which one of the six algorithm labels is implemented by an arbitrary C++ code snippet, we achieved over 80% precision.
Discipline
Software Engineering | Theory and Algorithms
Research Areas
Software and Cyber-Physical Systems
Publication
AAAI Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence: NLP for Software Engineering (NL4SE) 2018, New Orleans, February 2-7
First Page
758
Last Page
761
Publisher
AAAI Press
City or Country
Palo Alto, CA
Citation
BUI, Duy Quoc Nghi; JIANG, Lingxiao; and YU, Yijun.
Cross-language learning for program classification using bilateral tree-based convolutional neural networks. (2018). AAAI Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence: NLP for Software Engineering (NL4SE) 2018, New Orleans, February 2-7. 758-761.
Available at: https://ink.library.smu.edu.sg/sis_research/4307
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.