Intelligent Systems Applications to Multiple Domains Based on Innovative Signal and Image Processing

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (20 August 2022) | Viewed by 33697

Special Issue Editors

National Research Council, Institute of Intelligent Industrial Technologies and Systems for Advanced Manufacturing, 70126 Bari, Italy
Interests: computer vision; image processing; pattern recognition; machine learning; deep learning
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Institute of Intelligent Industrial Technologies and Systems for Advanced Manufacturing (STIIMA CNR), National Research Council, Bari, Italy
Interests: machine learning; deep learning; pattern recognition; image processing; computer vision

E-Mail Website1 Website2
Guest Editor
Department of Computer Science, University of Bari, Bari, Italy
Interests: machine learning; deep learning; pattern recognition; image processing; computer vision
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Nowadays, intelligent systems are largely being applied in multiple domains (e.g., Industry 4.0, smart healthcare, smart agriculture, or marine biology). The role of intelligent systems in dynamic contexts becomes crucial and mandatory, with the signal processing step acting as the starting point for many applications, including smart image/video processing, intelligent parameter monitoring, or high-level evaluations of complex events for applications in Industry 4.0, agriculture, medicine, life science, environmental analysis, and others.

Independently from the application scenario, the common focus of bringing innovation – employing artificial intelligence and innovative processing – requires a strong multidisciplinary effort in the research and development of intelligent systems.

The main purpose of this Special Issue is to collect innovative contributes in the field of signal and image processing (e.g., computer vision systems, new algorithms, or machine/deep learning applications), ranging from new methodologies to innovative approaches in different domains. Particular emphasis should be given to the application of deep learning techniques to solve common issues known in the literature as well as innovative best practices.

Topics of interest include, but are not limited to:

  • Industry 4.0
  • Marine biology
  • Environmental analysis
  • Smart agriculture
  • Life science
  • Medicine

Dr. Vito Renò
Dr. Rosalia Maglietta
Prof. Dr. Giovanni Dimauro
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • computer vision
  • machine learning
  • deep learning
  • pattern recognition

Published Papers (14 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Editorial

Jump to: Research

2 pages, 192 KiB  
Editorial
Special Issue on Intelligent Systems Applications to Multiple Domains Based on Innovative Signal and Image Processing
by Vito Renò, Rosalia Maglietta and Giovanni Dimauro
Appl. Sci. 2023, 13(7), 4373; https://0-doi-org.brum.beds.ac.uk/10.3390/app13074373 - 30 Mar 2023
Viewed by 731
Abstract
Nowadays, intelligent systems are largely applied in multiple domains (e [...] Full article

Research

Jump to: Editorial

23 pages, 1982 KiB  
Article
A Hybrid U-Lossian Deep Learning Network for Screening and Evaluating Parkinson’s Disease
by Rytis Maskeliūnas, Robertas Damaševičius, Audrius Kulikajevas, Evaldas Padervinskis, Kipras Pribuišis and Virgilijus Uloza
Appl. Sci. 2022, 12(22), 11601; https://0-doi-org.brum.beds.ac.uk/10.3390/app122211601 - 15 Nov 2022
Cited by 17 | Viewed by 4345
Abstract
Speech impairment analysis and processing technologies have evolved substantially in recent years, and the use of voice as a biomarker has gained popularity. We have developed an approach for clinical speech signal processing to demonstrate the promise of deep learning-driven voice analysis as [...] Read more.
Speech impairment analysis and processing technologies have evolved substantially in recent years, and the use of voice as a biomarker has gained popularity. We have developed an approach for clinical speech signal processing to demonstrate the promise of deep learning-driven voice analysis as a screening tool for Parkinson’s Disease (PD), the world’s second most prevalent neurodegenerative disease. Detecting Parkinson’s disease symptoms typically involves an evaluation by a movement disorder expert, which can be difficult to get and yield varied findings. A vocal digital biomarker might supplement the time-consuming traditional manual examination by recognizing and evaluating symptoms that characterize voice quality and level of deterioration. We present a deep learning based, custom U-lossian model for PD assessment and recognition. The study’s goal was to discover anomalies in the PD-affected voice and develop an automated screening method that can discriminate between the voices of PD patients and healthy volunteers while also providing a voice quality score. The classification accuracy was evaluated on two speech corpora (Italian PVS and own Lithuanian PD voice dataset) and we have found the result to be medically appropriate, with values of 0.8964 and 0.7949, confirming the proposed model’s high generalizability. Full article
Show Figures

Figure 1

19 pages, 4838 KiB  
Article
ODIN IVR-Interactive Solution for Emergency Calls Handling
by Bogdan-Costel Mocanu, Ion-Dorinel Filip, Remus-Dan Ungureanu, Catalin Negru, Mihai Dascalu, Stefan-Adrian Toma, Titus-Constantin Balan, Ion Bica and Florin Pop
Appl. Sci. 2022, 12(21), 10844; https://0-doi-org.brum.beds.ac.uk/10.3390/app122110844 - 26 Oct 2022
Cited by 7 | Viewed by 2467
Abstract
Human interaction in natural language with computer systems has been a prime focus of research, and the field of conversational agents (including chatbots and Interactive Voice Response (IVR) systems) has evolved significantly since 2009, with a major boost in 2016, especially for industrial [...] Read more.
Human interaction in natural language with computer systems has been a prime focus of research, and the field of conversational agents (including chatbots and Interactive Voice Response (IVR) systems) has evolved significantly since 2009, with a major boost in 2016, especially for industrial solutions. Emergency systems are crucial elements of today’s societies that can benefit from the advantages of intelligent human–computer interaction systems. In this paper, we present two solutions for human-to-computer emergency systems with critical deadlines that use a multi-layer FreeSwitch IVR solution and the Botpress chatbot platform. We are the pioneers in Romania who designed and implemented such a solution, which was evaluated in terms of performance and resource management concerning Quality of Service (QoS). Additionally, we assessed our Proof of Concept (PoC) with real data as part of the system for real-time Romanian transcription of speech and recognition of emotional states within emergency calls. Based on our feasibility research, we concluded that the telephony IVR best fits the requirements and specifications of the national 112 system, with the presented PoC ready to be integrated into the Romanian emergency system. Full article
Show Figures

Figure 1

19 pages, 3606 KiB  
Article
An Artificial Intelligence-Based Algorithm for the Assessment of Substitution Voicing
by Virgilijus Uloza, Rytis Maskeliunas, Kipras Pribuisis, Saulius Vaitkus, Audrius Kulikajevas and Robertas Damasevicius
Appl. Sci. 2022, 12(19), 9748; https://0-doi-org.brum.beds.ac.uk/10.3390/app12199748 - 28 Sep 2022
Cited by 6 | Viewed by 1600
Abstract
The purpose of this research was to develop an artificial intelligence-based method for evaluating substitution voicing (SV) and speech following laryngeal oncosurgery. Convolutional neural networks were used to analyze spoken audio sources. A Mel-frequency spectrogram was employed as input to the deep neural [...] Read more.
The purpose of this research was to develop an artificial intelligence-based method for evaluating substitution voicing (SV) and speech following laryngeal oncosurgery. Convolutional neural networks were used to analyze spoken audio sources. A Mel-frequency spectrogram was employed as input to the deep neural network architecture. The program was trained using a collection of 309 digitized speech recordings. The acoustic substitution voicing index (ASVI) model was elaborated using regression analysis. This model was then tested with speech samples that were unknown to the algorithm, and the results were compared to the auditory-perceptual SV evaluation provided by the medical professionals. A statistically significant, strong correlation with rs = 0.863 (p = 0.001) was observed between the ASVI and the SV evaluation performed by the trained laryngologists. The one-way ANOVA showed statistically significant ASVI differences in control, cordectomy, partial laryngectomy, and total laryngectomy patient groups (p < 0.001). The elaborated lightweight ASVI algorithm reached rapid response rates of 3.56 ms. The ASVI provides a fast and efficient option for SV and speech in patients after laryngeal oncosurgery. The ASVI results are comparable to the auditory-perceptual SV evaluation performed by medical professionals. Full article
Show Figures

Figure 1

28 pages, 8186 KiB  
Article
Improved Procedure for Multi-Focus Images Using Image Fusion with qshiftN DTCWT and MPCA in Laplacian Pyramid Domain
by Chinnem Rama Mohan, Kuldeep Chouhan, Ranjeet Kumar Rout, Kshira Sagar Sahoo, Noor Zaman Jhanjhi, Ashraf Osman Ibrahim and Abdelzahir Abdelmaboud
Appl. Sci. 2022, 12(19), 9495; https://0-doi-org.brum.beds.ac.uk/10.3390/app12199495 - 22 Sep 2022
Cited by 6 | Viewed by 1745
Abstract
Multi-focus image fusion (MIF) uses fusion rules to combine two or more images of the same scene with various focus values into a fully focused image. An all-in-focus image refers to a fully focused image that is more informative and useful for visual [...] Read more.
Multi-focus image fusion (MIF) uses fusion rules to combine two or more images of the same scene with various focus values into a fully focused image. An all-in-focus image refers to a fully focused image that is more informative and useful for visual perception. A fused image with high quality is essential for maintaining shift-invariant and directional selectivity characteristics of the image. Traditional wavelet-based fusion methods, in turn, create ringing distortions in the fused image due to a lack of directional selectivity and shift-invariance. In this paper, a classical MIF system based on quarter shift dual-tree complex wavelet transform (qshiftN DTCWT) and modified principal component analysis (MPCA) in the laplacian pyramid (LP) domain is proposed to extract the focused image from multiple source images. In the proposed fusion approach, the LP first decomposes the multi-focus source images into low-frequency (LF) components and high-frequency (HF) components. Then, qshiftN DTCWT is used to fuse low and high-frequency components to produce a fused image. Finally, to improve the effectiveness of the qshiftN DTCWT and LP-based method, the MPCA algorithm is utilized to generate an all-in-focus image. Due to its directionality, and its shift-invariance, this transform can provide high-quality information in a fused image. Experimental results demonstrate that the proposed method outperforms many state-of-the-art techniques in terms of visual and quantitative evaluations. Full article
Show Figures

Figure 1

13 pages, 2183 KiB  
Article
Constructing Condition Monitoring Model of Harmonic Drive
by Jong-Yih Kuo, Chao-Yang Hsu, Ping-Feng Wang, Hui-Chi Lin and Zhen-Gang Nie
Appl. Sci. 2022, 12(19), 9415; https://0-doi-org.brum.beds.ac.uk/10.3390/app12199415 - 20 Sep 2022
Cited by 4 | Viewed by 1170
Abstract
The harmonic drive is an essential industrial component. In industry, the efficient and accurate determination of machine faults has always been a significant problem to be solved. Therefore, this research proposes an anomaly detection model which can detect whether the harmonic drive has [...] Read more.
The harmonic drive is an essential industrial component. In industry, the efficient and accurate determination of machine faults has always been a significant problem to be solved. Therefore, this research proposes an anomaly detection model which can detect whether the harmonic drive has a gear-failure problem through the sound recorded by a microphone. The factory manager can thus detect the fault at an early stage and reduce the damage loss caused by the fault in the machine. In this research, multi-layer discrete wavelet transform was used to de-noise the sound samples, the Log Mel spectrogram was used for feature extraction, and finally, these data were entered into the EfficientNetV2 network. To assess the model performance, this research used the DCASE 2022 dataset for model evaluation, and the area under the characteristic acceptance curve (AUC) was estimated to be 5% higher than the DCASE 2022 baseline model. The model achieved 0.93 AUC for harmonic drive anomaly detection. Full article
Show Figures

Figure 1

10 pages, 805 KiB  
Article
Learning Analytics: Analysis of Methods for Online Assessment
by Vito Renò, Ettore Stella, Cosimo Patruno, Alessandro Capurso, Giovanni Dimauro and Rosalia Maglietta
Appl. Sci. 2022, 12(18), 9296; https://0-doi-org.brum.beds.ac.uk/10.3390/app12189296 - 16 Sep 2022
Cited by 3 | Viewed by 1662
Abstract
Assessment is a fundamental part of teaching and learning. With the advent of online learning platforms, the concept of assessment has changed. In the classical teaching methodology, the assessment is performed by an assessor, while in an online learning environment, the assessment can [...] Read more.
Assessment is a fundamental part of teaching and learning. With the advent of online learning platforms, the concept of assessment has changed. In the classical teaching methodology, the assessment is performed by an assessor, while in an online learning environment, the assessment can also take place automatically. The main purpose of this paper is to carry out a study on Learning Analytics, focusing in particular on the study and development of methodologies useful for the evaluation of learners. The goal of this work is to define an effective learning model that uses Educational Data to predict the outcome of a learning process. Supervised statistical learning techniques were studied and developed for the analysis of the OULAD benchmark dataset. The evaluation of the learning process of learners was performed by making binary predictions about passing or failing a course and using features related to the learner’s intermediate performance as well as the interactions with the e-learning platform. The Random Forest classification algorithm and other ensemble strategies were used to perform the task. The performance of the models trained on the OULAD dataset was excellent, showing an accuracy of 95% in predicting the students’ learning assessment. Full article
Show Figures

Figure 1

16 pages, 6318 KiB  
Article
Solar Irradiance Forecasting with Transformer Model
by Jiří Pospíchal, Martin Kubovčík and Iveta Dirgová Luptáková
Appl. Sci. 2022, 12(17), 8852; https://0-doi-org.brum.beds.ac.uk/10.3390/app12178852 - 02 Sep 2022
Cited by 6 | Viewed by 2452
Abstract
Solar energy is one of the most popular sources of renewable energy today. It is therefore essential to be able to predict solar power generation and adapt energy needs to these predictions. This paper uses the Transformer deep neural network model, in which [...] Read more.
Solar energy is one of the most popular sources of renewable energy today. It is therefore essential to be able to predict solar power generation and adapt energy needs to these predictions. This paper uses the Transformer deep neural network model, in which the attention mechanism is typically applied in NLP or vision problems. Here, it is extended by combining features based on their spatiotemporal properties in solar irradiance prediction. The results were predicted for arbitrary long-time horizons since the prediction is always 1 day ahead, which can be included at the end along the timestep axis of the input data and the first timestep representing the oldest timestep removed. A maximum worst-case mean absolute percentage error of 3.45% for the one-day-ahead prediction was obtained, which gave better results than the directly competing methods. Full article
Show Figures

Figure 1

21 pages, 5342 KiB  
Article
Real-Time Semantic Understanding and Segmentation of Urban Scenes for Vehicle Visual Sensors by Optimized DCNN Algorithm
by Yanyi Li, Jian Shi and Yuping Li
Appl. Sci. 2022, 12(15), 7811; https://0-doi-org.brum.beds.ac.uk/10.3390/app12157811 - 03 Aug 2022
Cited by 8 | Viewed by 1586
Abstract
The modern urban environment is becoming more and more complex. In helping us identify surrounding objects, vehicle vision sensors rely more on the semantic segmentation ability of deep learning networks. The performance of a semantic segmentation network is essential. This factor will directly [...] Read more.
The modern urban environment is becoming more and more complex. In helping us identify surrounding objects, vehicle vision sensors rely more on the semantic segmentation ability of deep learning networks. The performance of a semantic segmentation network is essential. This factor will directly affect the comprehensive level of driving assistance technology in road environment perception. However, the existing semantic segmentation network has a redundant structure, many parameters, and low operational efficiency. Therefore, to reduce the complexity of the network and reduce the number of parameters to improve the network efficiency, based on the deep learning (DL) theory, a method for efficient image semantic segmentation using Deep Convolutional Neural Network (DCNN) is deeply studied. First, the theoretical basis of the convolutional neural network (CNN) is briefly introduced, and the real-time semantic segmentation technology of urban scenes based on DCNN is recommended in detail. Second, the atrous convolution algorithm and the multi-scale parallel atrous spatial pyramid model are introduced. On the basis of this, an Efficient Symmetric Network (ESNet) of real-time semantic segmentation model for autonomous driving scenarios is proposed. The experimental results show that: (1) On the Cityscapes dataset, the ESNet structure achieves 70.7% segmentation accuracy for the 19 semantic categories set, and 87.4% for the seven large grouping categories. Compared with other algorithms, the accuracy has increased to varying degrees. (2) On the CamVid dataset, compared with segmentation networks of multiple lightweight real-time images, the parameters of the ESNet model are around 1.2 m, the highest FPS value is around 90 Hz, and the highest mIOU value is around 70%. In seven semantic categories, the segmentation accuracy of the ESNet model is the highest at around 98%. From this, we found that the ESNet significantly improves segmentation accuracy while maintaining faster forward inference speed. Overall, the research not only provides technical support for the development of real-time semantic understanding and segmentation of DCNN algorithms but also contributes to the development of artificial intelligence technology. Full article
Show Figures

Figure 1

16 pages, 29275 KiB  
Article
An Automatic Foreign Matter Detection and Sorting System for PVC Powder
by Ssu-Han Chen, Jer-Huan Jang, Yu-Ru Chang, Chih-Hsiang Kang, Hung-Yi Chen, Kevin Fong-Rey Liu, Fong-Lin Lee, Yang-Shen Hsueh and Meng-Jey Youh
Appl. Sci. 2022, 12(12), 6276; https://0-doi-org.brum.beds.ac.uk/10.3390/app12126276 - 20 Jun 2022
Cited by 2 | Viewed by 2426
Abstract
In the present study, an automatic defect detection system has been assembled and introduced for Polyvinyl chloride (PVC) powder. The average diameter for PVC powder is approximately 100 μm. The system hardware includes a powder delivery device, a sieving device, a circular platform, [...] Read more.
In the present study, an automatic defect detection system has been assembled and introduced for Polyvinyl chloride (PVC) powder. The average diameter for PVC powder is approximately 100 μm. The system hardware includes a powder delivery device, a sieving device, a circular platform, an image capture device, and a recycling device. A defect detection algorithm based on YOLOv4 was developed using CSPDarkNet53 as the backbone for feature extraction, spatial pyramid pooling (SPP) and path aggregation network (PAN) as the neck, and Yoloblock as the head. An auto-annotation algorithm was developed based on a digital image processing algorithm to save time in feature engineering. Several hyper-parameters have been employed to improve the efficiency of detection in the process of training YOLOv4. The Taguchi method was utilized to optimize the performance of detection, in which the mean average precision (mAP) is the response. Results show that our optimized YOLOv4 has a test mAP of 0.9385, compared to 0.8653 and 0.7999 for naïve YOLOv4 and Faster RCNN, respectively. Additionally, with the optimized YOLOv4, there is no false alarm for images without any foreign matter. Full article
Show Figures

Figure 1

10 pages, 2387 KiB  
Article
Prediction of Lithium-Ion Battery Capacity by Functional Principal Component Analysis of Monitoring Data
by MD Shoriat Ullah and Kangwon Seo
Appl. Sci. 2022, 12(9), 4296; https://0-doi-org.brum.beds.ac.uk/10.3390/app12094296 - 24 Apr 2022
Cited by 7 | Viewed by 2175
Abstract
The lithium-ion (Li-ion) battery is a promising energy storage technology for electronics, automobiles, and smart grids. Extensive research was conducted in the past to improve the prediction of the remaining capacity of the Li-ion battery. A robust prediction model would improve the battery [...] Read more.
The lithium-ion (Li-ion) battery is a promising energy storage technology for electronics, automobiles, and smart grids. Extensive research was conducted in the past to improve the prediction of the remaining capacity of the Li-ion battery. A robust prediction model would improve the battery performance and reliability for forthcoming usage. In the development of a data-driven capacity prediction model of Li-ion batteries, most past studies employed capacity degradation data; however, very few tried using other performance monitoring variables, such as temperature, voltage, and current data, to estimate and predict the battery capacity. In this study, we aimed to develop a data-driven model for predicting the capacity of Li-ion batteries adopting functional principal component analysis (fPCA) applied to functional monitoring data of temperature, voltage, and current observations. The proposed method is demonstrated using the battery monitoring data available in the NASA Ames Prognostics Center of Excellence repository. The main contribution of the study the development of an empirical data-driven model to diagnose the state-of-health (SOH) of Li-ion batteries based on the health monitoring data utilizing fPCA and LASSO regression. The study obtained encouraging battery capacity prediction performance by explaining overall variation through eigenfunctions of available monitored discharge parameters of Li-ion batteries. The result of capacity prediction obtained a root mean square error (RMSE) of 0.009. The proposed data-driven approach performs well for predicting the capacity by employing functional performance measures over the life span of a Li-ion battery. Full article
Show Figures

Figure 1

15 pages, 1078 KiB  
Article
FF-PCA-LDA: Intelligent Feature Fusion Based PCA-LDA Classification System for Plant Leaf Diseases
by Safdar Ali, Mehdi Hassan, Jin Young Kim, Muhammad Imran Farid, Muhammad Sanaullah and Hareem Mufti
Appl. Sci. 2022, 12(7), 3514; https://0-doi-org.brum.beds.ac.uk/10.3390/app12073514 - 30 Mar 2022
Cited by 21 | Viewed by 2572
Abstract
Crop leaf disease management and control pose significant impact on enhancement in yield and quality to fulfill consumer needs. For smart agriculture, an intelligent leaf disease identification system is inevitable for efficient crop health monitoring. In this view, a novel approach is proposed [...] Read more.
Crop leaf disease management and control pose significant impact on enhancement in yield and quality to fulfill consumer needs. For smart agriculture, an intelligent leaf disease identification system is inevitable for efficient crop health monitoring. In this view, a novel approach is proposed for crop disease identification using feature fusion and PCA-LDA classification (FF-PCA-LDA). Handcrafted hybrid and deep features are extracted from RGB images. TL-ResNet50 is used to extract the deep features. Fused feature vector is obtained by combining handcrafted hybrid and deep features. After fusing the image features, PCA is employed to select most discriminant features for LDA model development. Potato crop leaf disease identification is used as a case study for the validation of the approach. The developed system is experimentally validated on a potato crop leaf benchmark dataset. It offers high accuracy of 98.20% on an unseen dataset which was not used during the model training process. Performance comparison of the proposed technique with other approaches shows its superiority. Owing to the better discrimination and learning ability, the proposed approach overcomes the leaf segmentation step. The developed approach may be used as an automated tool for crop monitoring, management control, and can be extended for other crop types. Full article
Show Figures

Figure 1

16 pages, 6667 KiB  
Article
Focal Dice Loss-Based V-Net for Liver Segments Classification
by Berardino Prencipe, Nicola Altini, Giacomo Donato Cascarano, Antonio Brunetti, Andrea Guerriero and Vitoantonio Bevilacqua
Appl. Sci. 2022, 12(7), 3247; https://0-doi-org.brum.beds.ac.uk/10.3390/app12073247 - 23 Mar 2022
Cited by 11 | Viewed by 3241
Abstract
Liver segmentation is a crucial step in surgical planning from computed tomography scans. The possibility to obtain a precise delineation of the liver boundaries with the exploitation of automatic techniques can help the radiologists, reducing the annotation time and providing more objective and [...] Read more.
Liver segmentation is a crucial step in surgical planning from computed tomography scans. The possibility to obtain a precise delineation of the liver boundaries with the exploitation of automatic techniques can help the radiologists, reducing the annotation time and providing more objective and repeatable results. Subsequent phases typically involve liver vessels’ segmentation and liver segments’ classification. It is especially important to recognize different segments, since each has its own vascularization, and so, hepatic segmentectomies can be performed during surgery, avoiding the unnecessary removal of healthy liver parenchyma. In this work, we focused on the liver segments’ classification task. We exploited a 2.5D Convolutional Neural Network (CNN), namely V-Net, trained with the multi-class focal Dice loss. The idea of focal loss was originally thought as the cross-entropy loss function, aiming at focusing on “hard” samples, avoiding the gradient being overwhelmed by a large number of falsenegatives. In this paper, we introduce two novel focal Dice formulations, one based on the concept of individual voxel’s probability and another related to the Dice formulation for sets. By applying multi-class focal Dice loss to the aforementioned task, we were able to obtain respectable results, with an average Dice coefficient among classes of 82.91%. Moreover, the knowledge of anatomic segments’ configurations allowed the application of a set of rules during the post-processing phase, slightly improving the final segmentation results, obtaining an average Dice coefficient of 83.38%. The average accuracy was close to 99%. The best model turned out to be the one with the focal Dice formulation based on sets. We conducted the Wilcoxon signed-rank test to check if these results were statistically significant, confirming their relevance. Full article
Show Figures

Figure 1

12 pages, 4711 KiB  
Article
Gastrointestinal Disease Classification in Endoscopic Images Using Attention-Guided Convolutional Neural Networks
by Zenebe Markos Lonseko, Prince Ebenezer Adjei, Wenju Du, Chengsi Luo, Dingcan Hu, Linlin Zhu, Tao Gan and Nini Rao
Appl. Sci. 2021, 11(23), 11136; https://0-doi-org.brum.beds.ac.uk/10.3390/app112311136 - 24 Nov 2021
Cited by 15 | Viewed by 3233
Abstract
Gastrointestinal (GI) diseases constitute a leading problem in the human digestive system. Consequently, several studies have explored automatic classification of GI diseases as a means of minimizing the burden on clinicians and improving patient outcomes, for both diagnostic and treatment purposes. The challenge [...] Read more.
Gastrointestinal (GI) diseases constitute a leading problem in the human digestive system. Consequently, several studies have explored automatic classification of GI diseases as a means of minimizing the burden on clinicians and improving patient outcomes, for both diagnostic and treatment purposes. The challenge in using deep learning-based (DL) approaches, specifically a convolutional neural network (CNN), is that spatial information is not fully utilized due to the inherent mechanism of CNNs. This paper proposes the application of spatial factors in improving classification performance. Specifically, we propose a deep CNN-based spatial attention mechanism for the classification of GI diseases, implemented with encoder–decoder layers. To overcome the data imbalance problem, we adapt data-augmentation techniques. A total of 12,147 multi-sited, multi-diseased GI images, drawn from publicly available and private sources, were used to validate the proposed approach. Furthermore, a five-fold cross-validation approach was adopted to minimize inconsistencies in intra- and inter-class variability and to ensure that results were robustly assessed. Our results, compared with other state-of-the-art models in terms of mean accuracy (ResNet50 = 90.28, GoogLeNet = 91.38, DenseNets = 91.60, and baseline = 92.84), demonstrated better outcomes (Precision = 92.8, Recall = 92.7, F1-score = 92.8, and Accuracy = 93.19). We also implemented t-distributed stochastic neighbor embedding (t–SNE) and confusion matrix analysis techniques for better visualization and performance validation. Overall, the results showed that the attention mechanism improved the automatic classification of multi-sited GI disease images. We validated clinical tests based on the proposed method by overcoming previous limitations, with the goal of improving automatic classification accuracy in future work. Full article
Show Figures

Figure 1

Back to TopTop