Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
10-2025
Abstract
Handwritten Mathematical Expression Recognition (HMER) remains a challenging task due to the structural complexity of mathematical notation and the ambiguity of handwritten symbols-e.g., ''ρ'' vs. ''p'' or ''B'' vs. ''β''. While stroke-based models offer disambiguation via temporal cues, most existing methods are constrained by coarse modality fusion and a lack of fine-grained cross-modal alignment, further hindered by limited annotated data. We introduce Art for Math (Art4Math), a novel framework that leverages the structural richness of human sketches to enhance HMER through fine-grained, modality-aware learning. Art4Math follows a two-stage training paradigm: Art Grounding (A-Grd) and Math Decoding (M-Dec). In A-Grd, the model is trained to reconstruct masked regions of sketches via joint modeling of visual and stroke-level features, encouraging sensitivity to local structural cues and inter-modality alignment. This Art Grounding cultivates a strong inductive bias for parsing abstract, sparse visual forms. M-Dec then adapts this representation to the HMER domain, enabling more precise symbol disambiguation and structural decoding with limited supervision. Extensive experiments across sketch and handwriting-related tasks, including sketch recognition, retrieval, and HMER, demonstrate that Art4Math significantly outperforms existing self-supervised methods, revealing the overlooked synergy between artistic abstraction and mathematical expression.
Keywords
Multi-modal Learning, HMER, Sketch Representation Learning
Discipline
Artificial Intelligence and Robotics
Areas of Excellence
Digital transformation
Publication
MM '25: Proceedings of the 33rd ACM International Conference on Multimedia, Dublin, Ireland, October 27-31
First Page
1549
Last Page
1558
Identifier
10.1145/3746027.3755247
Publisher
ACM
City or Country
New York
Citation
ZHOU, Yang; WANG, Jin; ZHANG, Yuxiao; HUANG, Kaixiang; LU, Guodong; YANG, Jingru; and HE, Shengfeng.
Art4Math: Handwritten mathematical expression recognition via multimodal sketch grounding. (2025). MM '25: Proceedings of the 33rd ACM International Conference on Multimedia, Dublin, Ireland, October 27-31. 1549-1558.
Available at: https://ink.library.smu.edu.sg/sis_research/10791
Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1145/3746027.3755247