FAN-MCCD: Fast and Accurate Network for Multi-Scale Chinese Character Detection
Abstract
:1. Introduction
- We introduce an effective and accurate multi-scale Chinese character detector that exploits different scales and aspect ratios bounding boxes over feature maps from multiple stages to directly produce character predictions and eliminate costly steps (pre- and post-processing, and in-between), which then are sent to non-maximum suppression to yield final outcomes.
- The simplicity of our end-to-end character-level pipeline stands for the effectiveness of multi-scale Chinese character predictions in challenging old documents.
- Without bells and whistles, our proposed system significantly outperforms the up-to-the-minute SSD method in terms of simplicity and accuracy on Caoshu, Character, and Src-images datasets.
2. Related Work
3. Methodology
3.1. Proposed Feature Extractor
3.2. Default Boxes and IOU
3.3. Proposed Multi-Box Loss
3.4. Matching Technique
3.5. Online Hard Example Mining (OHEM)
3.6. Augmentation Sorts
3.7. Training
4. Experiment
4.1. Implementation Details
4.2. Benchmark Datasets
4.3. Comparison with State-of-the-Art SSD and Other Algorithms
4.4. Effectiveness of Different Layers
4.5. Effectiveness of the Positive Anchor Number
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Clanuwat, T.; Lamb, A.; Kitamoto, A. KuroNet: Pre-Modern Japanese Kuzushiji Character Recognition with Deep Learning. In Proceedings of the 2019 International Conference on Document Analysis and Recognition (ICDAR), Sydney, NSW, Australia, 20–25 September 2019; pp. 607–614. [Google Scholar] [CrossRef] [Green Version]
- Clinchant, S.; Déjean, H.; Meunier, J.; Lang, E.M.; Kleber, F. Comparing Machine Learning Approaches for Table Recognition in Historical Register Books. In Proceedings of the 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), Vienna, Austria, 24–27 April 2018; pp. 133–138. [Google Scholar] [CrossRef] [Green Version]
- Panichkriangkrai, C.; Li, L.; Hachimura, K. Character segmentation and retrieval for learning support system of Japanese historical books. In Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing, Washington, DC, USA, 24 August 2013; pp. 118–122. [Google Scholar]
- He, S.; Sammara, P.; Burgers, J.; Schomaker, L. Towards Style-Based Dating of Historical Documents. In Proceedings of the 2014 14th International Conference on Frontiers in Handwriting Recognition, Hersonissos, Greece, 1–4 September 2014; pp. 265–270. [Google Scholar] [CrossRef] [Green Version]
- Weinman, J.; Chen, Z.; Gafford, B.; Gifford, N.; Lamsal, A.; Niehus-Staab, L. Deep Neural Networks for Text Detection and Recognition in Historical Maps. In Proceedings of the 2019 International Conference on Document Analysis and Recognition (ICDAR), Sydney, NSW, Australia, 20–25 September 2019; pp. 902–909. [Google Scholar] [CrossRef]
- Beery, S.; Wu, G.; Rathod, V.; Votel, R.; Huang, J. Context R-CNN: Long Term Temporal Context for Per-Camera Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–18 June 2020; pp. 13075–13085. [Google Scholar]
- Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 936–944. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
- Zhang, Y.; Zhang, H.; Jain, A.K. Automatic caption localization in compressed video. IEEE Trans. Pattern Anal. March. Intell. 2000, 22, 385–392. [Google Scholar] [CrossRef]
- Sin, B.-K.; Kim, S.-K.; Cho, B.-J. Locating characters in scene images using frequency features. In Proceedings of the 2002 International Conference on Pattern Recognition, Quebec City, QC, Canada, 11–15 August 2002; pp. 489–492. [Google Scholar]
- Yan, J.; Li, J.; Gao, X. Chinese text location under complex background using Gabor filter and SVM. Neurocomputing 2011, 74, 2998–3008. [Google Scholar] [CrossRef]
- Huang, W.; Lin, Z.; Yang, J.; Wang, J. Text localization in natural images using stroke feature transform and text covariance descriptors. In Proceedings of the 2013 IEEE International Conference on Computer Vision, Sydney, NSW, Australia, 1–8 December 2013; pp. 1241–1248. [Google Scholar]
- Epshtein, B.; Ofek, E.; Wexler, Y. Detecting text in natural scenes with stroke width transform. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 2963–2970. [Google Scholar]
- Liao, M.; Shi, B.; Bai, X.; Wang, X.; Liu, W. TextBoxes: A fast text detector with a single deep neural network. In Proceedings of the AAAI, San Francisco, CA, USA, 4–9 February 2017; pp. 4161–4167. [Google Scholar]
- Liao, M.; Shi, B.; Bai, X. TextBoxes++: A single-shot oriented scene text detector. IEEE Trans. Image Process. 2018, 27, 3676–3690. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhou, X.; Yao, C.; Wen, H.; Wang, Y.; Zhou, S.; He, W.; Liang, J. East: An efficient and accurate scene text detector. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5551–5560. [Google Scholar]
- Wang, W.; Xie, E.; Li, X.; Hou, W.; Lu, T.; Yu, G.; Shao, S.H. Shape Robust Text Detection with Progressive Scale Expansion Network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 9336–9345. [Google Scholar]
- Tian, Z.; Huang, W.; He, T.; He, P.; Qiao, Y. Detecting text in natural image with connectionist text proposal network. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 56–72. [Google Scholar]
- Liu, F.; Chen, C.; Gu, D.; Zheng, J. FTPN: Scene Text Detection With Feature Pyramid Based Text Proposal Network. IEEE Access 2019, 7, 44219–44228. [Google Scholar] [CrossRef]
- Ma, J.; Shao, W.; Ye, H.; Wang, L.; Wang, H.; Zheng, Y.; Xue, X. Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimed. 2018, 20, 3111–3122. [Google Scholar] [CrossRef] [Green Version]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Phan, T.V.; Zhu, B.; Nakagawa, M. Development of Nom character segmentation for collecting patterns from historical document pages. In Proceedings of the 2011 Workshop on Historical Document Imaging and Processing, Beijing, China, 16–17 September 2011; pp. 133–139. [Google Scholar]
- Liu, C.L.; Kim, I.J.; Kim, J. Model-based stroke extraction and matching for handwritten Chinese character recognition. Pattern Recognit. Lett. 2001, 34, 2339–2352. [Google Scholar] [CrossRef]
- Qu, X.; Xu, N.; Wang, W.; Lu, K. Similar handwritten Chinese character recognition based on adaptive discriminative locality alignment. In Proceedings of the 2015 14th IAPR International Conference on Machine Vision Applications (MVA), Tokyo, Japan, 18–22 May 2015; pp. 130–133. [Google Scholar] [CrossRef]
- de Stefano, C.; Fontanella, F.; Marrocco, C.; di Freca, A.S. A GA-based feature selection approach with an application to handwritten character recognition. Pattern Recognit. Lett. 2019, 35, 130–141. [Google Scholar] [CrossRef]
- Yang, H.; Jin, L.; Huang, W.; Yang, Z.; Lai, S.; Sun, J. Dense and Tight Detection of Chinese Characters in Historical Documents: Datasets and a Recognition Guided Detector. IEEE Access 2018, 6, 3017430183. [Google Scholar] [CrossRef]
- Ly, N.T.; Nguyen, C.T.; Nakagawa, M. An attention-based row-column encoder-decoder model for text recognition in Japanese historical documents. Pattern Recognit. Lett. 2020, 136, 134–141. [Google Scholar] [CrossRef]
- Ziran, Z.; Pic, X.; Innocenti, S.U.; Mugnai, D.; Marinai, S. Text alignment in early printed books combining deep learning and dynamic programming. Pattern Recognit. Lett. 2020, 133, 109–115. [Google Scholar] [CrossRef]
- Cilia, N.D.; Stefano, C.D.; Fontanella, F.; Marrocco, C.; Molinara, M.; di Freca, A.S. An end-to-end deep learning system for medieval writer identification. Pattern Recognit. Lett. 2020, 129, 137–143. [Google Scholar] [CrossRef]
- Capobianco, S.; Scommegna, L.; Marinai, S. Historical Handwritten Document Segmentation by Using a Weighted Loss. In Artificial Neural Networks in Pattern Recognition. ANNPR 2018; Pancioni, L., Schwenker, F., Trentin, E., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2018; Volume 11081. [Google Scholar] [CrossRef] [Green Version]
- Droby, A.; Barakat, B.K.; Madi, B.; Alaasam, R.; El-Sana, J. Unsupervised Deep Learning for Handwritten Page Segmentation. In Proceedings of the 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), Dortmund, Germany, 8–10 September 2020; pp. 240–245. [Google Scholar] [CrossRef]
- Valy, D.; Verleysen, M.; Chhun, S. Data Augmentation and Text Recognition on Khmer Historical Manuscripts. In Proceedings of the 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), Dortmund, Germany, 8–10 September 2020; pp. 73–78. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; Volume 9351, pp. 234–241. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. Available online: https://arxiv.org/abs/1409.1556 (accessed on 28 October 2021).
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Li, Z.; Jin, L.; Lai, S.; Zhu, Y. Improving Attention-Based Handwritten Mathematical Expression Recognition with Scale Augmentation and Drop Attention. In Proceedings of the 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), Dortmund, Germany, 8–10 September 2020; pp. 175–180. [Google Scholar] [CrossRef]
- Ryu, J.; Kim, S. Chinese Character Boxes: Single Shot Detector Network for Chinese Character Detection. Appl. Sci. 2019, 9, 315. [Google Scholar] [CrossRef] [Green Version]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified real-time object detection. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Approach | IOU | DT | FPPC | F-Score |
---|---|---|---|---|
CCB_SSD [37] | 0.5 | 61.22% | 5.20 | 60.41% |
0.6 | 61.12% | 6.12 | 60.33% | |
0.7 | 60.32% | 8.11 | 59.20% | |
FPN_MCCD (Ours) | 0.5 | 98.84% | 0.71 | 98.64% |
0.6 | 98.45% | 0.72 | 97.55% | |
0.7 | 97.45% | 0.75 | 96.33% | |
RGD | 0.5 | 98.32% | 5.00 | 97.60% |
0.6 | 80.12% | 6.72 | 96.82% | |
0.7 | 97.30% | 7.69 | 94.82% | |
YOLO [38] | - | - | - | - |
Approach | IOU | DT | FPPC | F-Score |
---|---|---|---|---|
SSD | 0.5 | 61.22% | 5.20 | 60.41% |
0.6 | 61.12% | 6.12 | 60.33% | |
0.7 | 60.32% | 8.11 | 59.20% | |
FPN_MCCD (Ours) | 0.5 | 98.84% | 0.71 | 97.64% |
0.6 | 98.45% | 0.72 | 97.55% | |
0.7 | 97.45% | 0.75 | 96.33% |
Layers | ||||
---|---|---|---|---|
Dataset | {P2 P3 P4} | {P3 P4 P5} | {P2 P3 P4 P5} | {P2 P3 P4 P5 P6} |
Caoshu | 97.10% | 96.65% | 98.13% | 98.10% |
Character | 96.00% | 96.99% | 98.79% | 98.70% |
Src-images | 98.72% | 97.53% | 98.80% | 98.83% |
Merged dataset | 97.80% | 97.69% | 98.84% | 98.82% |
IOU Threshold | SSD Detector | FAN-MCCD (Ours) |
---|---|---|
0.5 | 20.21 | 20.21 |
0.6 | 6.07 | 5.05 |
0.7 | 3.01 | 4.06 |
0.8 | 2.62 | 3.01 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alnaasan, M.; Kim, S. FAN-MCCD: Fast and Accurate Network for Multi-Scale Chinese Character Detection. Sensors 2021, 21, 7289. https://0-doi-org.brum.beds.ac.uk/10.3390/s21217289
Alnaasan M, Kim S. FAN-MCCD: Fast and Accurate Network for Multi-Scale Chinese Character Detection. Sensors. 2021; 21(21):7289. https://0-doi-org.brum.beds.ac.uk/10.3390/s21217289
Chicago/Turabian StyleAlnaasan, Manar, and Sungho Kim. 2021. "FAN-MCCD: Fast and Accurate Network for Multi-Scale Chinese Character Detection" Sensors 21, no. 21: 7289. https://0-doi-org.brum.beds.ac.uk/10.3390/s21217289