Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

7-2023

Abstract

Compiler provenance is significant in investigating the source-level indicators of binary code, like development-environment, source compiler, and optimization settings. Not only does compiler provenance analysis have important security applications in malware and vulnerability analysis, but it is also very challenging to extract useful artifacts from binary when high-level language constructs are missing. Previous works applied machine-learning techniques to predict the source compiler of binaries. However, most of the work is done on the binaries compiled on Linux operating system. We highlight the importance and need to explore Windows compilers and the complicated binaries compiled on the latest versions of these compilers. Therefore, we construct a large dataset of real-world binaries compiled with four major compilers on Windows and four most common optimization settings. The complexity of the optimized programs leads us to identify specific patterns in the binaries that contribute to source compiler and specific optimization level. To address these observations, we propose an improved model based upon the state-of-the-art, and incorporate streamlined alignment padding features in the existing model. Thus, our improved model learns alignment instructions from binary code of portable executables and libraries using the attention mechanism. We conduct an extensive experimentation on a dataset of 296,169 unique and complex binary code generated from C/C++ applications. Our findings demonstrate that our proposed model significantly outperforms the state-of-the-art in accurately predicting the source compiler and optimization flag for complex compiled code.

Keywords

compiler provenance, alignment padding, Windows binaries, binary code similarity

Discipline

Programming Languages and Compilers

Research Areas

Intelligent Systems and Optimization

Publication

The 28th Australasian Conference on Information Security and Privacy (ACISP 2023)

Volume

13915

First Page

609

Last Page

629

ISBN

978-303135485-4

Identifier

10.1007/978-3-031-35486-1_26

City or Country

Brisbane, Australia

Copyright Owner and License

Authors

Additional URL

https://doi.org/10.1007/978-3-031-35486-1_26

Share

COinS