Next Article in Journal
Epithelium and Stroma Identification in Histopathological Images Using Unsupervised and Semi-Supervised Superpixel-Based Segmentation
Next Article in Special Issue
A Holistic Technique for an Arabic OCR System
Previous Article in Journal
Performance of the Commercial PP/ZnS:Cu and PP/ZnS:Ag Scintillation Screens for Fast Neutron Imaging
Article

DocCreator: A New Software for Creating Synthetic Ground-Truthed Document Images

1
Laboratoire Bordelais de Recherche en Informatique UMR 5800, Université de Bordeaux, CNRS, Bordeaux INP, 33400 Talence, France
2
Laboratoire Informatique, Image et Interaction (L3i), Université de La Rochelle, 17000 La Rochelle, France
3
LIPADE Laboratory, Paris Descartes University, 45, rue des Saints-Pères, 75270 Paris, CEDEX 6, France
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work. Other authors: Kieu Van-Cuong worked on degradation models, Antoine Billy worked on synthetic document reconstruction.
Received: 30 October 2017 / Revised: 29 November 2017 / Accepted: 5 December 2017 / Published: 11 December 2017
(This article belongs to the Special Issue Document Image Processing)
Most digital libraries that provide user-friendly interfaces, enabling quick and intuitive access to their resources, are based on Document Image Analysis and Recognition (DIAR) methods. Such DIAR methods need ground-truthed document images to be evaluated/compared and, in some cases, trained. Especially with the advent of deep learning-based approaches, the required size of annotated document datasets seems to be ever-growing. Manually annotating real documents has many drawbacks, which often leads to small reliably annotated datasets. In order to circumvent those drawbacks and enable the generation of massive ground-truthed data with high variability, we present DocCreator, a multi-platform and open-source software able to create many synthetic image documents with controlled ground truth. DocCreator has been used in various experiments, showing the interest of using such synthetic images to enrich the training stage of DIAR tools. View Full-Text
Keywords: synthetic image generation; document degradation models; performance evaluation; data augmentation for retraining and fine-tuning; DIAR synthetic image generation; document degradation models; performance evaluation; data augmentation for retraining and fine-tuning; DIAR
Show Figures

Figure 1

MDPI and ACS Style

Journet, N.; Visani, M.; Mansencal, B.; Van-Cuong, K.; Billy, A. DocCreator: A New Software for Creating Synthetic Ground-Truthed Document Images. J. Imaging 2017, 3, 62. https://0-doi-org.brum.beds.ac.uk/10.3390/jimaging3040062

AMA Style

Journet N, Visani M, Mansencal B, Van-Cuong K, Billy A. DocCreator: A New Software for Creating Synthetic Ground-Truthed Document Images. Journal of Imaging. 2017; 3(4):62. https://0-doi-org.brum.beds.ac.uk/10.3390/jimaging3040062

Chicago/Turabian Style

Journet, Nicholas, Muriel Visani, Boris Mansencal, Kieu Van-Cuong, and Antoine Billy. 2017. "DocCreator: A New Software for Creating Synthetic Ground-Truthed Document Images" Journal of Imaging 3, no. 4: 62. https://0-doi-org.brum.beds.ac.uk/10.3390/jimaging3040062

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop