Submit to Applied Sciences Review for Applied Sciences Propose a Special Issue

Journal Browser

Applications of Machine Learning to Image, Video, Text and Bioinformatic Analysis

Print Special Issue Flyer
Special Issue Editors
Special Issue Information
Published Papers

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (10 May 2023) | Viewed by 35245

Share This Special Issue

Special Issue Editor

Dr. Vera Yuk Ying Chung

E-Mail Website
Guest Editor

School of Computer Science, University of Sydney, Sydney, New South Wales 2006, Australia
Interests: light field image processing; machine learning; multimedia processing; text analysis; bioinformatics
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

With the rapid development of Artificial Intelligence and the Internet, image processing has been widely used, such as face recognition and video tracking applications. With the increasing amount of multimedia data generated by images and videos, the study of image, text, video analysis, and bioinformatics has become a hot topic in artificial intelligence. Machine learning technology has proven its importance and can significantly improve image processing, text analysis, video, and bioinformatic analysis accuracy.

This Special Issue aims to provide a comprehensive appraisal of machine learning technology's innovative applications in image processing, video, text and bioinformatic analysis. Topics of interest include but are not limited to the following:

Light Field Image Processing;
Deep Learning;
Machine Learning;
Image Processing;
Text Analysis;
Natural Language Processing;
Video Analysis;
Object Detection;
Computer Vision;
Artificial Intelligence;
Image Classification;
Neural Network;
Bioinformatics.

Dr. Vera Yuk Ying Chung
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (12 papers)

Download All Papers

Research

Jump to: Review

16 pages, 8010 KiB

Open AccessArticle

Crack Severity Classification from Timber Cross-Sectional Images Using Convolutional Neural Network

by Shigeru Kato, Naoki Wada, Kazuki Shiogai, Takashi Tamaki, Tomomichi Kagawa, Renon Toyosaki and Hajime Nobuhara

Appl. Sci. 2023, 13(3), 1280; https://0-doi-org.brum.beds.ac.uk/10.3390/app13031280 - 18 Jan 2023

Cited by 3 | Viewed by 1333

Abstract

Cedar and cypress used for wooden construction have high moisture content after harvesting. To be used as building materials, they must undergo high-temperature drying. However, this process causes internal cracks that are invisible on the outer surface. These defects are serious because they reduce the strength of the timber, i.e., the buckling strength and joint durability. Therefore, the severity of internal cracks should be evaluated. A square timber was cut at an arbitrary position and assessed based on the length, thickness, and shape of the cracks in the cross-section; however, this process is time-consuming and labor-intensive. Therefore, we used a convolutional neural network (CNN) to automatically evaluate the severity of cracks from cross-sectional timber images. Previously, we used silver-painted images of cross-sections so that the cracks are easier to observe; however, this task was burdensome. Hence, in this study, we attempted to classify crack severity using ResNet (Residual Neural Network) from unpainted images. First, ResNet50 was employed and trained with supervised data to classify the crack severity level. The classification accuracy was then evaluated using test images (not used for training) and reached 86.67%. In conclusion, we confirmed that the proposed CNN could evaluate cross-sectional cracks on behalf of humans. Full article

(This article belongs to the Special Issue Applications of Machine Learning to Image, Video, Text and Bioinformatic Analysis)

► Show Figures

Figure 1

19 pages, 3515 KiB

Open AccessArticle

MCA-YOLOV5-Light: A Faster, Stronger and Lighter Algorithm for Helmet-Wearing Detection

by Cheng Sun, Shiwen Zhang, Peiqi Qu, Xingjin Wu, Peng Feng, Zhanya Tao, Jin Zhang and Ying Wang

Appl. Sci. 2022, 12(19), 9697; https://doi.org/10.3390/app12199697 - 27 Sep 2022

Cited by 9 | Viewed by 2451

Abstract

It is an essential measure for workers to wear safety helmets when entering the construction site to prevent head injuries caused by object collision and falling. This paper proposes a lightweight algorithm for helmet-wearing detection based on YOLOV5, which is faster and more robust for helmet detection in natural construction scenarios. In this paper, the MCA attention mechanism is embedded in the backbone network to help the network extract more productive information, reduce the missed detection rate of small helmet objects and improve detection accuracy. In order to ensure the safety of workers in construction, it is necessary to detect whether the construction workers are wearing safety helmets in real-time to achieve monitoring on-site. A channel pruning strategy is proposed on the MCA-YOLOv5 algorithm to compress it, realizing the optimal large-scale model into ultrasmall models for real-time detection on embedded or mobile devices. The experimental results on the public data set show that the model parameter volume is reduced by 87.2%, and the detection speed is increased by 53.5%, even though the MCA-YOLOv5-light reduces the mAP slightly. Full article

(This article belongs to the Special Issue Applications of Machine Learning to Image, Video, Text and Bioinformatic Analysis)

► Show Figures

Figure 1

11 pages, 9982 KiB

Open AccessCommunication

Automatic Classification of Crack Severity from Cross-Section Image of Timber Using Simple Convolutional Neural Network

by Shigeru Kato, Naoki Wada, Kazuki Shiogai, Takashi Tamaki, Tomomichi Kagawa, Renon Toyosaki and Hajime Nobuhara

Appl. Sci. 2022, 12(16), 8250; https://0-doi-org.brum.beds.ac.uk/10.3390/app12168250 - 18 Aug 2022

Cited by 2 | Viewed by 1189

Abstract

Cedar and other timbers used for construction generally undergo a high-temperature drying process after being harvested to maintain their quality. However, internal cracks occur during this process. This is an issue because it deteriorates the structural performance, such as buckling strength and joint durability of the timber. Since preventing these internal cracks is difficult, their severity must be examined manually. Currently, the length, thickness, and area of the cracks on a cross-sectional surface of square timber are measured using calipers. However, this process is time-consuming and labor-intensive. Therefore, we employed a convolutional neural network (CNN), widely used in artificial intelligence applications, to automatically evaluate the severity of cracks from cross-sectional images of timber. A novel CNN was constructed and experimentally evaluated in this study. The average classification accuracy was 85.67%. Full article

(This article belongs to the Special Issue Applications of Machine Learning to Image, Video, Text and Bioinformatic Analysis)

► Show Figures

Figure 1

15 pages, 2933 KiB

Open AccessArticle

Face Recognition Based on Deep Learning and FPGA for Ethnicity Identification

by Ahmed Jawad A. AlBdairi, Zhu Xiao, Ahmed Alkhayyat, Amjad J. Humaidi, Mohammed A. Fadhel, Bahaa Hussein Taher, Laith Alzubaidi, José Santamaría and Omran Al-Shamma

Appl. Sci. 2022, 12(5), 2605; https://0-doi-org.brum.beds.ac.uk/10.3390/app12052605 - 02 Mar 2022

Cited by 16 | Viewed by 5142

Abstract

In the last decade, there has been a surge of interest in addressing complex Computer Vision (CV) problems in the field of face recognition (FR). In particular, one of the most difficult ones is based on the accurate determination of the ethnicity of mankind. In this regard, a new classification method using Machine Learning (ML) tools is proposed in this paper. Specifically, a new Deep Learning (DL) approach based on a Deep Convolutional Neural Network (DCNN) model is developed, which outperforms a reliable determination of the ethnicity of people based on their facial features. However, it is necessary to make use of specialized high-performance computing (HPC) hardware to build a workable DCNN-based FR system due to the low computation power given by the current central processing units (CPUs). Recently, the latter approach has increased the efficiency of the network in terms of power usage and execution time. Then, the usage of field-programmable gate arrays (FPGAs) was considered in this work. The performance of the new DCNN-based FR method using FPGA was compared against that using graphics processing units (GPUs). The experimental results considered an image dataset composed of 3141 photographs of citizens from three distinct countries. To our knowledge, this is the first image collection gathered specifically to address the ethnicity identification problem. Additionally, the ethnicity dataset was made publicly available as a novel contribution to this work. Finally, the experimental results proved the high performance provided by the proposed DCNN model using FPGAs, achieving an accuracy level of 96.9 percent and an F1 score of 94.6 percent while using a reasonable amount of energy and hardware resources. Full article

(This article belongs to the Special Issue Applications of Machine Learning to Image, Video, Text and Bioinformatic Analysis)

► Show Figures

Figure 1

14 pages, 926 KiB

Open AccessArticle

An Intelligent Radiomic Approach for Lung Cancer Screening

by Guillermo Torres, Sonia Baeza, Carles Sanchez, Ignasi Guasch, Antoni Rosell and Debora Gil

Appl. Sci. 2022, 12(3), 1568; https://0-doi-org.brum.beds.ac.uk/10.3390/app12031568 - 31 Jan 2022

Cited by 3 | Viewed by 2477

Abstract

The efficiency of lung cancer screening for reducing mortality is hindered by the high rate of false positives. Artificial intelligence applied to radiomics could help to early discard benign cases from the analysis of CT scans. The available amount of data and the fact that benign cases are a minority, constitutes a main challenge for the successful use of state of the art methods (like deep learning), which can be biased, over-fitted and lack of clinical reproducibility. We present an hybrid approach combining the potential of radiomic features to characterize nodules in CT scans and the generalization of the feed forward networks. In order to obtain maximal reproducibility with minimal training data, we propose an embedding of nodules based on the statistical significance of radiomic features for malignancy detection. This representation space of lesions is the input to a feed forward network, which architecture and hyperparameters are optimized using own-defined metrics of the diagnostic power of the whole system. Results of the best model on an independent set of patients achieve 100% of sensitivity and 83% of specificity (AUC = 0.94) for malignancy detection. Full article

(This article belongs to the Special Issue Applications of Machine Learning to Image, Video, Text and Bioinformatic Analysis)

► Show Figures

Figure 1

15 pages, 1655 KiB

Open AccessArticle

Improving Graph-Based Movie Recommender System Using Cinematic Experience

by CheonSol Lee, DongHee Han, Keejun Han and Mun Yi

Appl. Sci. 2022, 12(3), 1493; https://0-doi-org.brum.beds.ac.uk/10.3390/app12031493 - 29 Jan 2022

Cited by 16 | Viewed by 4947

Abstract

With the advent of many movie content platforms, users face a flood of content and consequent difficulties in selecting appropriate movie titles. Although much research has been conducted in developing effective recommender systems to provide personalized recommendations based on customers’ past preferences and behaviors, not much attention has been paid to leveraging users’ sentiments and emotions together. In this study, we built a new graph-based movie recommender system that utilized sentiment and emotion information along with user ratings, and evaluated its performance in comparison to well known conventional models and state-of-the-art graph-based models. The sentiment and emotion information were extracted using fine-tuned BERT. We used a Kaggle dataset created by crawling movies’ meta-data and review data from the Rotten Tomatoes website and Amazon product data. The study results show that the proposed IGMC-based models coupled with emotion and sentiment are superior over the compared models. The findings highlight the significance of using sentiment and emotion information in relation to movie recommendation. Full article

(This article belongs to the Special Issue Applications of Machine Learning to Image, Video, Text and Bioinformatic Analysis)

► Show Figures

Figure 1

16 pages, 3570 KiB

Open AccessArticle

Emotion Estimation Method Based on Emoticon Image Features and Distributed Representations of Sentences

by Akira Fujisawa, Kazuyuki Matsumoto, Minoru Yoshida and Kenji Kita

Appl. Sci. 2022, 12(3), 1256; https://0-doi-org.brum.beds.ac.uk/10.3390/app12031256 - 25 Jan 2022

Cited by 2 | Viewed by 2346

Abstract

This paper proposes an emotion recognition method for tweets containing emoticons using their emoticon image and language features. Some of the existing methods register emoticons and their facial expression categories in a dictionary and use them, while other methods recognize emoticon facial expressions based on the various elements of the emoticons. However, highly accurate emotion recognition cannot be performed unless the recognition is based on a combination of the features of sentences and emoticons. Therefore, we propose a model that recognizes emotions by extracting the shape features of emoticons from their image data and applying the feature vector input that combines the image features with features extracted from the text of the tweets. Based on evaluation experiments, the proposed method is confirmed to achieve high accuracy and shown to be more effective than methods that use text features only. Full article

(This article belongs to the Special Issue Applications of Machine Learning to Image, Video, Text and Bioinformatic Analysis)

► Show Figures

Figure 1

16 pages, 303 KiB

Open AccessArticle

Automatic Construction of Fine-Grained Paraphrase Corpora System Using Language Inference Model

by Ying Zhou, Xiaokang Hu and Vera Chung

Appl. Sci. 2022, 12(1), 499; https://0-doi-org.brum.beds.ac.uk/10.3390/app12010499 - 05 Jan 2022

Cited by 1 | Viewed by 1677

Abstract

Paraphrase detection and generation are important natural language processing (NLP) tasks. Yet the term paraphrase is broad enough to include many fine-grained relations. This leads to different tolerance levels of semantic divergence in the positive paraphrase class among publicly available paraphrase datasets. Such variation can affect the generalisability of paraphrase classification models. It may also impact the predictability of paraphrase generation models. This paper presents a new model which can use few corpora of fine-grained paraphrase relations to construct automatically using language inference models. The fine-grained sentence level paraphrase relations are defined based on word and phrase level counterparts. We demonstrate that the fine-grained labels from our proposed system can make it possible to generate paraphrases at desirable semantic level. The new labels could also contribute to general sentence embedding techniques. Full article

(This article belongs to the Special Issue Applications of Machine Learning to Image, Video, Text and Bioinformatic Analysis)

► Show Figures

Figure 1

15 pages, 3736 KiB

Open AccessArticle

Behavioral Parameter Field for Human Abnormal Behavior Recognition in Low-Resolution Thermal Imaging Video

by Baodong Wang, Xiaofeng Jiang, Zihao Dong and Jinping Li

Appl. Sci. 2022, 12(1), 402; https://0-doi-org.brum.beds.ac.uk/10.3390/app12010402 - 31 Dec 2021

Cited by 1 | Viewed by 1486

Abstract

In recent years, thermal imaging cameras are widely used in the field of intelligent surveillance because of their special imaging characteristics and better privacy protection properties. However, due to the low resolution and fixed location for current thermal imaging cameras, it is difficult to effectively identify human behavior using a single detection method based on skeletal keypoints. Therefore, a self-update learning method is proposed for fixed thermal imaging camera scenes, called the behavioral parameter field (BPF). This method can express the regularity of human behavior patterns concisely and directly. Firstly, the detection accuracy of small targets under low-resolution video is improved by optimizing the YOLOv4 network to obtain a human detection model under thermal imaging video. Secondly, the BPF model is designed to learn the human normal behavior features at each position. Finally, based on the learned BPF model, we propose to use metric modules, such as cosine similarity and intersection over union matching, to accomplish the classification of human abnormal behaviors. In the experimental stage, the living scene of the indoor elderly living alone is applied as our experimental case, and a variety of detection models are compared to the proposed method for verifying the effectiveness and practicability of the proposed behavioral parameter field in the self-collected thermal imaging dataset for the indoor elderly living alone. Full article

(This article belongs to the Special Issue Applications of Machine Learning to Image, Video, Text and Bioinformatic Analysis)

► Show Figures

Figure 1

19 pages, 4653 KiB

Open AccessArticle

Att-BiL-SL: Attention-Based Bi-LSTM and Sequential LSTM for Describing Video in the Textual Formation

by Shakil Ahmed, A F M Saifuddin Saif, Md Imtiaz Hanif, Md Mostofa Nurannabi Shakil, Md Mostofa Jaman, Md Mazid Ul Haque, Siam Bin Shawkat, Jahid Hasan, Borshan Sarker Sonok, Farzad Rahman and Hasan Muhommod Sabbir

Appl. Sci. 2022, 12(1), 317; https://0-doi-org.brum.beds.ac.uk/10.3390/app12010317 - 29 Dec 2021

Cited by 10 | Viewed by 5117

Abstract

With the advancement of the technological field, day by day, people from around the world are having easier access to internet abled devices, and as a result, video data is growing rapidly. The increase of portable devices such as various action cameras, mobile cameras, motion cameras, etc., can also be considered for the faster growth of video data. Data from these multiple sources need more maintenance to process for various usages according to the needs. By considering these enormous amounts of video data, it cannot be navigated fully by the end-users. Throughout recent times, many research works have been done to generate descriptions from the images or visual scene recordings to address the mentioned issue. This description generation, also known as video captioning, is more complex than single image captioning. Various advanced neural networks have been used in various studies to perform video captioning. In this paper, we propose an attention-based Bi-LSTM and sequential LSTM (Att-BiL-SL) encoder-decoder model for describing the video in textual format. The model consists of two-layer attention-based bi-LSTM and one-layer sequential LSTM for video captioning. The model also extracts the universal and native temporal features from the video frames for smooth sentence generation from optical frames. This paper includes the word embedding with a soft attention mechanism and a beam search optimization algorithm to generate qualitative results. It is found that the architecture proposed in this paper performs better than various existing state of the art models. Full article

(This article belongs to the Special Issue Applications of Machine Learning to Image, Video, Text and Bioinformatic Analysis)

► Show Figures

Figure 1

22 pages, 8186 KiB

Open AccessArticle

Memory Model for Morphological Semantics of Visual Stimuli Using Sparse Distributed Representation

by Kyuchang Kang and Changseok Bae

Appl. Sci. 2021, 11(22), 10786; https://0-doi-org.brum.beds.ac.uk/10.3390/app112210786 - 15 Nov 2021

Cited by 1 | Viewed by 1776

Abstract

Recent achievements on CNN (convolutional neural networks) and DNN (deep neural networks) researches provide a lot of practical applications on computer vision area. However, these approaches require construction of huge size of training data for learning process. This paper tries to find a way for continual learning which does not require prior high-cost training data construction by imitating a biological memory model. We employ SDR (sparse distributed representation) for information processing and semantic memory model, which is known as a representation model of firing patterns on neurons in neocortex area. This paper proposes a novel memory model to reflect remembrance of morphological semantics of visual input stimuli. The proposed memory model considers both memory process and recall process separately. First, memory process converts input visual stimuli to sparse distributed representation, and in this process, morphological semantic of input visual stimuli can be preserved. Next, recall process can be considered by comparing sparse distributed representation of new input visual stimulus and remembered sparse distributed representations. Superposition of sparse distributed representation is used to measure similarities. Experimental results using 10,000 images in MNIST (Modified National Institute of Standards and Technology) and Fashion-MNIST data sets show that the sparse distributed representation of the proposed model efficiently keeps morphological semantic of the input visual stimuli. Full article

(This article belongs to the Special Issue Applications of Machine Learning to Image, Video, Text and Bioinformatic Analysis)

► Show Figures

Figure 1

Review

Jump to: Research

33 pages, 3518 KiB

Open AccessReview

Analysis of Recent Deep Learning Techniques for Arabic Handwritten-Text OCR and Post-OCR Correction

by Rayyan Najam and Safiullah Faizullah

Appl. Sci. 2023, 13(13), 7568; https://0-doi-org.brum.beds.ac.uk/10.3390/app13137568 - 27 Jun 2023

Cited by 5 | Viewed by 3410

Abstract

Arabic handwritten-text recognition applies an OCR technique and then a text-correction technique to extract the text within an image correctly. Deep learning is a current paradigm utilized in OCR techniques. However, no study investigated or critically analyzed recent deep-learning techniques used for Arabic handwritten OCR and text correction during the period of 2020–2023. This analysis fills this noticeable gap in the literature, uncovering recent developments and their limitations for researchers, practitioners, and interested readers. The results reveal that CNN-LSTM-CTC is the most suitable architecture among Transformer and GANs for OCR because it is less complex and can hold long textual dependencies. For OCR text correction, applying DL models to generated errors in datasets improved accuracy in many works. In conclusion, Arabic OCR has the potential to further apply several text-embedding models to correct the resultant text from the OCR, and there is a significant gap in studies investigating this problem. In addition, there is a need for more high-quality and domain-specific OCR Arabic handwritten datasets. Moreover, we recommend the practical development of a space for future trends in Arabic OCR applications, derived from current limitations in Arabic OCR works and from applications in other languages; this will involve a plethora of possibilities that have not been effectively researched at the time of writing. Full article

(This article belongs to the Special Issue Applications of Machine Learning to Image, Video, Text and Bioinformatic Analysis)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Applications of Machine Learning to Image, Video, Text and Bioinformatic Analysis

Share This Special Issue

Special Issue Editor

Special Issue Information

Published Papers (12 papers)

Research

Review

Further Information

Guidelines

MDPI Initiatives

Follow MDPI