Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
8-2025
Abstract
Understanding molecular structure and related knowledge is crucialfor scientific research. Recent studies integrate molecular graphswith their textual descriptions to enhance molecular representationlearning. However, they focus on the whole molecular graph andneglect frequently occurring subgraphs, known as motifs, whichare essential for determining molecular properties. Without suchfine-grained knowledge, these models struggle to generalize to un-seen molecules and tasks that require motif-level insights. To bridgethis gap, we propose FineMolTex, a novel Fine-grained Moleculargraph-Text pre-training framework to jointly learn coarse-grainedmolecule-level knowledge and fine-grained motif-level knowledge.Specifically, FineMolTex consists of two pre-training tasks: a con-trastive alignment task for coarse-grained matching and a maskedmulti-modal modeling task for fine-grained matching. In particular,the latter predicts the labels of masked motifs and words, whichare selected based on their importance. By leveraging insights fromboth modalities, FineMolTex is able to understand the fine-grainedmatching between motifs and words. Finally, we conduct extensiveexperiments across three downstream tasks, achieving up to 230%improvement in the text-based molecule editing task. Additionally,our case studies reveal that FineMolTex successfully captures fine-grained knowledge, potentially offering valuable insights for drugdiscovery and catalyst design.
Keywords
Graph Neural Networks, Molecular Graph Pre-training
Discipline
Graphics and Human Computer Interfaces
Research Areas
Intelligent Systems and Optimization
Areas of Excellence
Digital transformation
Publication
KDD '25: Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2, Toronto, Canada, August 3-7
Volume
2
First Page
1589
Last Page
1599
Identifier
10.1145/3711896.3736834
Publisher
ACM
City or Country
New York
Citation
LI, Yibo; FANG, Yuan; ZHANG, Mengmei; and SHI, Chuan.
Advancing molecular graph-text pre-training via fine-grained alignment. (2025). KDD '25: Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2, Toronto, Canada, August 3-7. 2, 1589-1599.
Available at: https://ink.library.smu.edu.sg/sis_research/10775
Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1145/3711896.3736834