Publication Type
Conference Proceeding Article
Version
acceptedVersion
Publication Date
9-2022
Abstract
In this study, an OCR system based on deep learning techniques was deployed to digitize scanned agricultural regulatory documents comprising of certificates and labels. Recognition of the certificates and labels is challenging as they are scanned images of the hard copy form and the layout and size of the text as well as the languages vary between the various countries (due to diverse regulatory requirements). We evaluated and compared between various state-of-the-art deep learningbased text detection and recognition model as well as a packaged OCR library – Tesseract. We then adopted a two-stage approach comprising of text detection using Character Region Awareness For Text (CRAFT) followed by recognition using OCR branch of a multi-lingual text recognition algorithm E2E-MLT. A sliding windows text matcher is used to enhance the extraction of the required information such as trade names, active ingredients and crops. Initial evaluation revealed that the system performs well with a high accuracy of 91.9% for the recognition of trade names in certificates and labels and the system is currently deployed for use in Philippines, one of our collaborator’s sites.
Keywords
Deep learning, Text detection, Optical character recognition, Regulatory document
Discipline
Artificial Intelligence and Robotics | Databases and Information Systems | Numerical Analysis and Scientific Computing
Research Areas
Data Science and Engineering
Publication
Advances in Computational Collective Intelligence: 14th International Conference, ICCI 2022, Hammamet, Tunisia, September 28-30: Proceedings
Volume
1653
First Page
223
Last Page
234
ISBN
9783031162107
Identifier
10.1007/978-3-031-16210-7_18
Publisher
Springer
City or Country
Cham
Citation
FWA, Hua Leong and CHAN, Farn Haur.
Deep learning-based text recognition of agricultural regulatory document. (2022). Advances in Computational Collective Intelligence: 14th International Conference, ICCI 2022, Hammamet, Tunisia, September 28-30: Proceedings. 1653, 223-234.
Available at: https://ink.library.smu.edu.sg/sis_research/7334
Copyright Owner and License
Authors
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1007/978-3-031-16210-7_18
Included in
Artificial Intelligence and Robotics Commons, Databases and Information Systems Commons, Numerical Analysis and Scientific Computing Commons