Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
7-2023
Abstract
Compiler provenance is significant in investigating the source-level indicators of binary code, like development-environment, source compiler, and optimization settings. Not only does compiler provenance analysis have important security applications in malware and vulnerability analysis, but it is also very challenging to extract useful artifacts from binary when high-level language constructs are missing. Previous works applied machine-learning techniques to predict the source compiler of binaries. However, most of the work is done on the binaries compiled on Linux operating system. We highlight the importance and need to explore Windows compilers and the complicated binaries compiled on the latest versions of these compilers. Therefore, we construct a large dataset of real-world binaries compiled with four major compilers on Windows and four most common optimization settings. The complexity of the optimized programs leads us to identify specific patterns in the binaries that contribute to source compiler and specific optimization level. To address these observations, we propose an improved model based upon the state-of-the-art, and incorporate streamlined alignment padding features in the existing model. Thus, our improved model learns alignment instructions from binary code of portable executables and libraries using the attention mechanism. We conduct an extensive experimentation on a dataset of 296,169 unique and complex binary code generated from C/C++ applications. Our findings demonstrate that our proposed model significantly outperforms the state-of-the-art in accurately predicting the source compiler and optimization flag for complex compiled code.
Keywords
compiler provenance, alignment padding, Windows binaries, binary code similarity
Discipline
Programming Languages and Compilers
Research Areas
Intelligent Systems and Optimization
Publication
The 28th Australasian Conference on Information Security and Privacy (ACISP 2023)
Volume
13915
First Page
609
Last Page
629
ISBN
978-303135485-4
Identifier
10.1007/978-3-031-35486-1_26
City or Country
Brisbane, Australia
Citation
MALIHA ISMAIL; LIN, Yan; HAN, DongGyun; and GAO, Debin.
BinAlign: Alignment Padding Based Compiler Provenance Recovery. (2023). The 28th Australasian Conference on Information Security and Privacy (ACISP 2023). 13915, 609-629.
Available at: https://ink.library.smu.edu.sg/sis_research/8417
Copyright Owner and License
Authors
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1007/978-3-031-35486-1_26