Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

2-2021

Abstract

Recently program learning techniques have been proposed to process source code based on syntactical structures (e.g., Abstract Syntax Trees) and/or semantic information (e.g., Dependency Graphs). While graphs may be better at capturing various viewpoints of code semantics than trees, constructing graph inputs from code need static code semantic analysis that may not be accurate and introduces noise during learning. On the other hand, syntax trees are precisely defined according to the language grammar and easier to construct and process than graphs. We propose a new tree-based learning technique, named TreeCaps, by fusing capsule networks with tree-based convolutional neural networks, to achieve learning accuracy higher than existing graph-based techniques while it is based only on trees. TreeCaps introduces novel variableto-static routing algorithms into the capsule networks to compensate for the loss of previous routing algorithms. Aside from accuracy, we also find that TreeCaps is the most robust to withstand those semantic-preserving program transformations that change code syntax without modifying the semantics. Evaluated on a large number of Java and C/C++ programs, TreeCaps models outperform prior deep learning models of program source code, in terms of both accuracy and robustness for program comprehension tasks such as code functionality classification and function name prediction. The implementation of TreeCaps is publicly available at https://github.com/bdqnghi/treecaps.

Discipline

OS and Networks | Software Engineering

Research Areas

Software and Cyber-Physical Systems

Publication

Proceedings of the 35th AAAI Conference on Artificial Intelligence, Virtual Conference, February 2-9

First Page

1

Last Page

9

Publisher

AAAI

City or Country

Virtual Conference

Copyright Owner and License

Authors

Additional URL

https://www.aaai.org/AAAI21Papers/AAAI-9746.BuiNDQ.pdf

Share

COinS