Surgical activity triplet recognition via triplet disentanglement
Publication Type
Conference Proceeding Article
Publication Date
10-2023
Abstract
Including context-aware decision support in the operating room has the potential to improve surgical safety and efficiency by utilizing real-time feedback obtained from surgical workflow analysis. In this task, recognizing each surgical activity in the endoscopic video as a triplet instrument, verb, target> is crucial, as it helps to ensure actions occur only after an instrument is present. However, recognizing the states of these three components in one shot poses extra learning ambiguities, as the triplet supervision is highly imbalanced (positive when all components are correct). To remedy this issue, we introduce a triplet disentanglement framework for surgical action triplet recognition, which decomposes the learning objectives to reduce learning difficulties. Particularly, our network decomposes the recognition of triplet into five complementary and simplified sub-networks. While the first sub-network converts the detection into a numerical supplementary task predicting the existence/number of three components only, the second focuses on the association between them, and the other three predict the components individually. In this way, triplet recognition is decoupled in a progressive, easy-to-difficult manner. In addition, we propose a hierarchical training schedule as a way to decompose the difficulty of the task further. Our model first creates several bridges and then progressively identifies the final key task step by step, rather than explicitly identifying surgical activity. Our proposed method has been demonstrated to surpass current state-of-the-art approaches on the CholecT45 endoscopic video dataset.
Keywords
Triplet disentanglement, Surgical activity recognition, Endoscopic videos, Pattern recognition
Discipline
Analytical, Diagnostic and Therapeutic Techniques and Equipment | Artificial Intelligence and Robotics | Graphics and Human Computer Interfaces
Publication
Proceedings of the 26th International Conference on Medical Image Computing and Computer-Assisted Intervention, Vancouver, Canada, October 8-12, 2023
Volume
14228
First Page
451
Last Page
461
ISBN
9783031439957
Identifier
10.1007/978-3-031-43996-4_43
Publisher
Springer Nature
City or Country
Switzerland
Citation
CHEN, Yiliang; HE, Shengfeng; JIN, Yueming; and QIN, Jing.
Surgical activity triplet recognition via triplet disentanglement. (2023). Proceedings of the 26th International Conference on Medical Image Computing and Computer-Assisted Intervention, Vancouver, Canada, October 8-12, 2023. 14228, 451-461.
Available at: https://ink.library.smu.edu.sg/sis_research/8498
Copyright Owner and License
Authors