Publication Type

Journal Article

Version

acceptedVersion

Publication Date

9-2025

Abstract

With the rapid development of blockchain technology, the widespread adoption of smart contracts—particularly in decentralized finance (DeFi) applications—has introduced significant security challenges, such as reentrancy attacks, phishing, and Sybil attacks. To address these issues, we propose a novel model called TrxGNNBERT, which combines Graph Neural Network (GNN) and the Transformer architecture to effectively handle both graph-structured and textual data. This combination enhances the detection of suspicious transactions and accounts on blockchain platforms like Ethereum. TrxGNNBERT was pre-trained using a masked language model (MLM) on a dataset of 60,000 Ethereum transactions by randomly masking the attributes of nodes and edges, thereby capturing deep semantic relationships and structural information. In this work, we constructed transaction subgraphs, using a GNN module to enrich the embedding representations, which were then fed into a Transformer encoder. The experimental results demonstrate that TrxGNNBERT outperforms various baseline models—including DeepWalk, Trans2Vec, Role2Vec, GCN, GAT, GraphSAGE, CodeBERT, GraphCodeBERT, Zipzap and BERT4ETH—in detecting suspicious transactions and accounts. Specifically, TrxGNNBERT achieved an accuracy of 0.755 and an F1 score of 0.756 on the TrxLarge dataset; an accuracy of 0.903 and an F1 score of 0.894 on the TrxSmall dataset; and an accuracy of 0.790 and an F1 score of 0.781 on the AddrDec dataset. We also explored different pre-training configurations and strategies, comparing the performance of encoder-based versus decoder-based Transformer structures. The results indicate that pre-training improves downstream task performance, with encoder-based structures outperforming decoder-based ones. Through ablation studies, we found that node-level information and subgraph structures are critical for achieving optimal performance in transaction classification tasks. When key features were removed, the model performance declined considerably, demonstrating the importance of each component of our method. These findings offer valuable insights for future research, suggesting further improvements in node attribute representation and subgraph extraction.

Keywords

Smart contract transaction, pre-trained model, blockchain security

Discipline

Artificial Intelligence and Robotics | Programming Languages and Compilers

Areas of Excellence

Digital transformation

Publication

IEEE Transactions on Information Forensics and Security

Volume

20

First Page

10051

Last Page

10065

ISSN

1556-6013

Identifier

10.1109/TIFS.2025.3612184

Publisher

Institute of Electrical and Electronics Engineers

Additional URL

https://doi.org/10.1109/TIFS.2025.3612184

Share

COinS