Advances in Deep Neural Networks for Visual Pattern Recognition

A special issue of Journal of Imaging (ISSN 2313-433X). This special issue belongs to the section "Computer Vision and Pattern Recognition".

Deadline for manuscript submissions: closed (30 June 2022) | Viewed by 10733

Special Issue Editors


E-Mail Website
Guest Editor
School of Engineering, Zurich University of Applied Sciences ZHAW, 8400 Winterthur, Switzerland
Interests: artificial intelligence; deep learning; pattern recognition; reinforcement learning; speaker recognition
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
School of Engineering, Zurich University of Applied Sciences ZHAW, 8400 Winterthur, Switzerland
Interests: artificial intelligence; deep learning; pattern recognition; reinforcement learning
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Deep neural networks have been the standard for pattern recognition in computer vision since the ImageNet competition in 2012. Great advances have been made since then, both methodologically and in terms of successful applications. However, with every passing year of alleged breakthroughs, we become more and more aware of the many remaining unknowns, almost to the point of admitting: "We know that we know nothing" (yet).

Methodologically, for example, evidence is growing that the long-standing image recognition paradigm of episodic classification of IID samples is stagnating, and that active vision approaches are necessary to increase recognition scores by another order of magnitude (Gori, "What’s Wrong with Computer Vision?", 2018). Theoretically, it is still not well understood why deep neural networks are so very efficient in learning generalizable functions (Tishby, "Deep learning and the information bottleneck principle", 2015). This leads to a current trend of empirically detected design principles for neural networks (Kaplan et al., "Scaling laws for neural language models", 2020). Practically, many real-world applications are suffering from an unstable performance of learned models, raising issues of robustness, interpretability, and deployability, not speaking of issues with small training sets (related to sample complexity) (Stadelmann et al., "Deep Learning in the Wild", 2018).

In this Special Issue of the Journal of Imaging, we request contributions that cover all three aspects: methodical, theoretical, and practical work addressing current issues in visual pattern recognition with novel insights and scientifically founded evaluations.

Prof. Dr. Thilo Stadelmann
Dr. Frank-Peter Schilling
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Journal of Imaging is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • supervised, semisupervised, and unsupervised deep learning
  • deep reinforcement learning and active vision
  • principles and best practices for neural network architecture design
  • generative models for pattern recognition
  • interpretability and explainability of neural networks
  • robustness and generalization of neural networks (e.g., confidence, sample efficiency, out-of-distribution performance)
  • metalearning, Auto-ML
  • image classification and segmentation
  • object detection
  • document analysis, e.g., handwriting recognition
  • biometrics
  • industrial applications such as predictive maintenance, automatic quality control, etc.
  • medical image processing, digital histopathology

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

12 pages, 1430 KiB  
Article
Attention Guided Feature Encoding for Scene Text Recognition
by Ehtesham Hassan and Lekshmi V. L.
J. Imaging 2022, 8(10), 276; https://0-doi-org.brum.beds.ac.uk/10.3390/jimaging8100276 - 08 Oct 2022
Viewed by 1362
Abstract
The real-life scene images exhibit a range of variations in text appearances, including complex shapes, variations in sizes, and fancy font properties. Consequently, text recognition from scene images remains a challenging problem in computer vision research. We present a scene text recognition methodology [...] Read more.
The real-life scene images exhibit a range of variations in text appearances, including complex shapes, variations in sizes, and fancy font properties. Consequently, text recognition from scene images remains a challenging problem in computer vision research. We present a scene text recognition methodology by designing a novel feature-enhanced convolutional recurrent neural network architecture. Our work addresses scene text recognition as well as sequence-to-sequence modeling, where a novel deep encoder–decoder network is proposed. The encoder in the proposed network is designed around a hierarchy of convolutional blocks enabled with spatial attention blocks, followed by bidirectional long short-term memory layers. In contrast to existing methods for scene text recognition, which incorporate temporal attention on the decoder side of the entire architecture, our convolutional architecture incorporates novel spatial attention design to guide feature extraction onto textual details in scene text images. The experiments and analysis demonstrate that our approach learns robust text-specific feature sequences for input images, as the convolution architecture designed for feature extraction is tuned to capture a broader spatial text context. With extensive experiments on ICDAR2013, ICDAR2015, IIIT5K and SVT datasets, the paper demonstrates an improvement over many important state-of-the-art methods. Full article
(This article belongs to the Special Issue Advances in Deep Neural Networks for Visual Pattern Recognition)
Show Figures

Figure 1

15 pages, 32829 KiB  
Article
Unsupervised Domain Adaptation for Vertebrae Detection and Identification in 3D CT Volumes Using a Domain Sanity Loss
by Pascal Sager, Sebastian Salzmann, Felice Burn and Thilo Stadelmann
J. Imaging 2022, 8(8), 222; https://0-doi-org.brum.beds.ac.uk/10.3390/jimaging8080222 - 19 Aug 2022
Cited by 3 | Viewed by 1745
Abstract
A variety of medical computer vision applications analyze 2D slices of computed tomography (CT) scans, whereas axial slices from the body trunk region are usually identified based on their relative position to the spine. A limitation of such systems is that either the [...] Read more.
A variety of medical computer vision applications analyze 2D slices of computed tomography (CT) scans, whereas axial slices from the body trunk region are usually identified based on their relative position to the spine. A limitation of such systems is that either the correct slices must be extracted manually or labels of the vertebrae are required for each CT scan to develop an automated extraction system. In this paper, we propose an unsupervised domain adaptation (UDA) approach for vertebrae detection and identification based on a novel Domain Sanity Loss (DSL) function. With UDA the model’s knowledge learned on a publicly available (source) data set can be transferred to the target domain without using target labels, where the target domain is defined by the specific setup (CT modality, study protocols, applied pre- and processing) at the point of use (e.g., a specific clinic with its specific CT study protocols). With our approach, a model is trained on the source and target data set in parallel. The model optimizes a supervised loss for labeled samples from the source domain and the DSL loss function based on domain-specific “sanity checks” for samples from the unlabeled target domain. Without using labels from the target domain, we are able to identify vertebra centroids with an accuracy of 72.8%. By adding only ten target labels during training the accuracy increases to 89.2%, which is on par with the current state-of-the-art for full supervised learning, while using about 20 times less labels. Thus, our model can be used to extract 2D slices from 3D CT scans on arbitrary data sets fully automatically without requiring an extensive labeling effort, contributing to the clinical adoption of medical imaging by hospitals. Full article
(This article belongs to the Special Issue Advances in Deep Neural Networks for Visual Pattern Recognition)
Show Figures

Figure 1

16 pages, 1555 KiB  
Article
Weakly Supervised Polyp Segmentation in Colonoscopy Images Using Deep Neural Networks
by Siwei Chen, Gregor Urban and Pierre Baldi
J. Imaging 2022, 8(5), 121; https://0-doi-org.brum.beds.ac.uk/10.3390/jimaging8050121 - 22 Apr 2022
Cited by 6 | Viewed by 3037
Abstract
Colorectal cancer (CRC) is a leading cause of mortality worldwide, and preventive screening modalities such as colonoscopy have been shown to noticeably decrease CRC incidence and mortality. Improving colonoscopy quality remains a challenging task due to limiting factors including the training levels of [...] Read more.
Colorectal cancer (CRC) is a leading cause of mortality worldwide, and preventive screening modalities such as colonoscopy have been shown to noticeably decrease CRC incidence and mortality. Improving colonoscopy quality remains a challenging task due to limiting factors including the training levels of colonoscopists and the variability in polyp sizes, morphologies, and locations. Deep learning methods have led to state-of-the-art systems for the identification of polyps in colonoscopy videos. In this study, we show that deep learning can also be applied to the segmentation of polyps in real time, and the underlying models can be trained using mostly weakly labeled data, in the form of bounding box annotations that do not contain precise contour information. A novel dataset, Polyp-Box-Seg of 4070 colonoscopy images with polyps from over 2000 patients, is collected, and a subset of 1300 images is manually annotated with segmentation masks. A series of models is trained to evaluate various strategies that utilize bounding box annotations for segmentation tasks. A model trained on the 1300 polyp images with segmentation masks achieves a dice coefficient of 81.52%, which improves significantly to 85.53% when using a weakly supervised strategy leveraging bounding box images. The Polyp-Box-Seg dataset, together with a real-time video demonstration of the segmentation system, are publicly available. Full article
(This article belongs to the Special Issue Advances in Deep Neural Networks for Visual Pattern Recognition)
Show Figures

Figure 1

14 pages, 10260 KiB  
Article
Metal Artifact Reduction in Spectral X-ray CT Using Spectral Deep Learning
by Matteo Busi, Christian Kehl, Jeppe R. Frisvad and Ulrik L. Olsen
J. Imaging 2022, 8(3), 77; https://0-doi-org.brum.beds.ac.uk/10.3390/jimaging8030077 - 17 Mar 2022
Cited by 6 | Viewed by 3328
Abstract
Spectral X-ray computed tomography (SCT) is an emerging method for non-destructive imaging of the inner structure of materials. Compared with the conventional X-ray CT, this technique provides spectral photon energy resolution in a finite number of energy channels, adding a new dimension to [...] Read more.
Spectral X-ray computed tomography (SCT) is an emerging method for non-destructive imaging of the inner structure of materials. Compared with the conventional X-ray CT, this technique provides spectral photon energy resolution in a finite number of energy channels, adding a new dimension to the reconstructed volumes and images. While this mitigates energy-dependent distortions such as beam hardening, metal artifacts due to photon starvation effects are still present, especially for low-energy channels where the attenuation coefficients are higher. We present a correction method for metal artifact reduction in SCT that is based on spectral deep learning. The correction efficiently reduces streaking artifacts in all the energy channels measured. We show that the additional information in the energy domain provides relevance for restoring the quality of low-energy reconstruction affected by metal artifacts. The correction method is parameter free and only takes around 15 ms per energy channel, satisfying near-real time requirement of industrial scanners. Full article
(This article belongs to the Special Issue Advances in Deep Neural Networks for Visual Pattern Recognition)
Show Figures

Figure 1

Back to TopTop