Computer Vision-Based Intelligent Systems: Challenges and Approaches

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (20 September 2022) | Viewed by 31465

Special Issue Editor

Department of Electronics, Telecommunications and Informatics, University of Aveiro, Aveiro, Portugal
Interests: image processing; signal processing; intelligent systems; robotics
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

The research on intelligent systems has been increasing every day, and has led to the design of multiple types of these systems with innumerous applications in our daily life. The recent advances in artificial intelligence (AI), together with the huge amount of digital data available, have boosted their performance in several ways. One of the richest sources of information is obtained through the use of digital cameras and computer vision algorithms. Computer systems with digital cameras using the most recent AI techniques are intelligent to perceive and understand the visual world. However, there are still several open challenges in order to approach the capability to use computer vision at the level of the human eye, despite the existence of several systems that overcome humans in a number of specific tasks.

This Special Issue seeks contributions that present innovative intelligent systems based on the use of computer vision, as well as contributions focusing on recent advances in computer vision, that solve the most relevant challenges faced by current and future intelligent systems.

Prof. Dr. António J. R. Neves
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • computer vision
  • image processing
  • intelligent systems
  • machine learning
  • artificial intelligence
  • image interpretation
  • smart cities
  • intelligent robotics
  • smart transportation

Published Papers (11 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

10 pages, 1711 KiB  
Article
Tracking Sensor Location by Video Analysis in Double-Shell Tank Inspections
by Jacob Price, Ethan Aaberg, Changki Mo and John Miller
Appl. Sci. 2023, 13(15), 8708; https://0-doi-org.brum.beds.ac.uk/10.3390/app13158708 - 28 Jul 2023
Viewed by 601
Abstract
Double-shell tanks (DSTs) are a critical part of the infrastructure for nuclear waste management at the U.S. Department of Energy’s Hanford site. They are expected to be used for the interim storage of partially liquid nuclear waste until 2050, which is the target [...] Read more.
Double-shell tanks (DSTs) are a critical part of the infrastructure for nuclear waste management at the U.S. Department of Energy’s Hanford site. They are expected to be used for the interim storage of partially liquid nuclear waste until 2050, which is the target date for completing the immobilization process for all Hanford nuclear waste. At that time, DSTs will have been used about 15 years beyond their original projected lifetime. Consequently, for the next approximately 30 years, Hanford DSTs will undergo periodic nondestructive evaluation (NDE) to ensure their integrity. One approach to perform NDE is to use ultrasonic data from a robot moving through air slots, originally designed for cooling, in the confined space between primary and secondary tanks. Interpreting ultrasonic sensor output requires knowing where measurements were taken with a precision of approximately one inch. Analyzing video acquired during inspection is one approach to tracking sensor location. The top edge of an air slot is easily detected due to the difference in color and texture between the primary tank bottom and the air slot walls. A line fit to this edge is used in a model to calculate the apparent width of the air slot in pixels at targets near the top edge that can be recognized in video images. The apparent width of the air slot at the chosen target in a later video frame determines how far the robot has moved between those frames. Algorithms have been developed that automate target selection and matching in later frames. Tests in a laboratory mockup demonstrated that the method tracks the location of the ultrasonic sensor with the required precision. Full article
(This article belongs to the Special Issue Computer Vision-Based Intelligent Systems: Challenges and Approaches)
Show Figures

Figure 1

16 pages, 3462 KiB  
Article
Korean Sign Language Recognition Using Transformer-Based Deep Neural Network
by Jungpil Shin, Abu Saleh Musa Miah, Md. Al Mehedi Hasan, Koki Hirooka, Kota Suzuki, Hyoun-Sup Lee and Si-Woong Jang
Appl. Sci. 2023, 13(5), 3029; https://0-doi-org.brum.beds.ac.uk/10.3390/app13053029 - 27 Feb 2023
Cited by 16 | Viewed by 3983
Abstract
Sign language recognition (SLR) is one of the crucial applications of the hand gesture recognition and computer vision research domain. There are many researchers who have been working to develop a hand gesture-based SLR application for English, Turkey, Arabic, and other sign languages. [...] Read more.
Sign language recognition (SLR) is one of the crucial applications of the hand gesture recognition and computer vision research domain. There are many researchers who have been working to develop a hand gesture-based SLR application for English, Turkey, Arabic, and other sign languages. However, few studies have been conducted on Korean sign language classification because few KSL datasets are publicly available. In addition, the existing Korean sign language recognition work still faces challenges in being conducted efficiently because light illumination and background complexity are the major problems in this field. In the last decade, researchers successfully applied a vision-based transformer for recognizing sign language by extracting long-range dependency within the image. Moreover, there is a significant gap between the CNN and transformer in terms of the performance and efficiency of the model. In addition, we have not found a combination of CNN and transformer-based Korean sign language recognition models yet. To overcome the challenges, we proposed a convolution and transformer-based multi-branch network aiming to take advantage of the long-range dependencies computation of the transformer and local feature calculation of the CNN for sign language recognition. We extracted initial features with the grained model and then parallelly extracted features from the transformer and CNN. After concatenating the local and long-range dependencies features, a new classification module was applied for the classification. We evaluated the proposed model with a KSL benchmark dataset and our lab dataset, where our model achieved 89.00% accuracy for 77 label KSL dataset and 98.30% accuracy for the lab dataset. The higher performance proves that the proposed model can achieve a generalized property with considerably less computational cost. Full article
(This article belongs to the Special Issue Computer Vision-Based Intelligent Systems: Challenges and Approaches)
Show Figures

Figure 1

10 pages, 1984 KiB  
Article
Blind Image Quality Assessment with Deep Learning: A Replicability Study and Its Reproducibility in Lifelogging
by Ricardo Ribeiro, Alina Trifan and António J. R. Neves
Appl. Sci. 2023, 13(1), 59; https://0-doi-org.brum.beds.ac.uk/10.3390/app13010059 - 21 Dec 2022
Cited by 3 | Viewed by 1251
Abstract
The wide availability and small size of different types of sensors have allowed for the acquisition of a huge amount of data about a person’s life in real time. With these data, usually denoted as lifelog data, we can analyze and understand personal [...] Read more.
The wide availability and small size of different types of sensors have allowed for the acquisition of a huge amount of data about a person’s life in real time. With these data, usually denoted as lifelog data, we can analyze and understand personal experiences and behaviors. Most of the lifelog research has explored the use of visual data. However, a considerable amount of these images or videos are affected by different types of degradation or noise due to the non-controlled acquisition process. Image Quality Assessment can plays an essential role in lifelog research to deal with these data. We present in this paper a twofold study on the topic of blind image quality assessment. On the one hand, we explore the replication of the training process of a state-of-the-art deep learning model for blind image quality assessment in the wild. On the other hand, we present evidence that blind image quality assessment is an important pre-processing step to be further explored in the context of information retrieval in lifelogging applications. We consider that our efforts have been successful in the replication of the model training process, achieving similar results of inference when compared to the original version, while acknowledging a fair number of assumptions that we had to consider. Moreover, these assumptions motivated an extensive additional analysis that led to significant insights on the influence of both batch size and loss functions when training deep learning models in this context. We include preliminary results of the replicated model on a lifelogging dataset, as a potential reproducibility aspect to be considered. Full article
(This article belongs to the Special Issue Computer Vision-Based Intelligent Systems: Challenges and Approaches)
Show Figures

Figure 1

24 pages, 9196 KiB  
Article
Human Activity Recognition for Assisted Living Based on Scene Understanding
by Stefan-Daniel Achirei, Mihail-Cristian Heghea, Robert-Gabriel Lupu and Vasile-Ion Manta
Appl. Sci. 2022, 12(21), 10743; https://0-doi-org.brum.beds.ac.uk/10.3390/app122110743 - 24 Oct 2022
Cited by 6 | Viewed by 1827
Abstract
The growing share of the population over the age of 65 is putting pressure on the social health insurance system, especially on institutions that provide long-term care services for the elderly or to people who suffer from chronic diseases or mental disabilities. This [...] Read more.
The growing share of the population over the age of 65 is putting pressure on the social health insurance system, especially on institutions that provide long-term care services for the elderly or to people who suffer from chronic diseases or mental disabilities. This pressure can be reduced through the assisted living of the patients, based on an intelligent system for monitoring vital signs and home automation. In this regard, since 2008, the European Commission has financed the development of medical products and services through the ambient assisted living (AAL) program—Ageing Well in the Digital World. The SmartCare Project, which integrates the proposed Computer Vision solution, follows the European strategy on AAL. This paper presents an indoor human activity recognition (HAR) system based on scene understanding. The system consists of a ZED 2 stereo camera and a NVIDIA Jetson AGX processing unit. The recognition of human activity is carried out in two stages: all humans and objects in the frame are detected using a neural network, then the results are fed to a second network for the detection of interactions between humans and objects. The activity score is determined based on the human–object interaction (HOI) detections. Full article
(This article belongs to the Special Issue Computer Vision-Based Intelligent Systems: Challenges and Approaches)
Show Figures

Figure 1

12 pages, 671 KiB  
Article
Online Kanji Characters Based Writer Identification Using Sequential Forward Floating Selection and Support Vector Machine
by Md. Al Mehedi Hasan, Jungpil Shin and Md. Maniruzzaman
Appl. Sci. 2022, 12(20), 10249; https://0-doi-org.brum.beds.ac.uk/10.3390/app122010249 - 12 Oct 2022
Cited by 3 | Viewed by 1252
Abstract
Writer identification has become a hot research topic in the fields of pattern recognition, forensic document analysis, the criminal justice system, etc. The goal of this research is to propose an efficient approach for writer identification based on online handwritten Kanji characters. We [...] Read more.
Writer identification has become a hot research topic in the fields of pattern recognition, forensic document analysis, the criminal justice system, etc. The goal of this research is to propose an efficient approach for writer identification based on online handwritten Kanji characters. We collected 47,520 samples from 33 people who wrote 72 online handwritten-based Kanji characters 20 times. We extracted features from the handwriting data and proposed a support vector machine (SVM)-based classifier for writer identification. We also conducted experiments to see how the accuracy changes with feature selection and parameter tuning. Both text-dependent and text-independent writer identification were studied in this work. In the case of text-dependent writer identification, we obtained the accuracy of each Kanji character separately. We then studied the text-independent case by considering some of the top discriminative characters from the text-dependent case. Finally, another text-dependent experiment was performed by taking two, three, and four Kanji characters instead of using only one character. The experimental results illustrated that SVM provided the highest identification accuracy of 99.0% for the text-independent case and 99.6% for text-dependent writer identification. We hope that this study will be helpful for writer identification using online handwritten Kanji characters. Full article
(This article belongs to the Special Issue Computer Vision-Based Intelligent Systems: Challenges and Approaches)
Show Figures

Figure 1

18 pages, 890 KiB  
Article
Traffic State Prediction Using One-Dimensional Convolution Neural Networks and Long Short-Term Memory
by Selim Reza, Marta Campos Ferreira, José J. M. Machado and João Manuel R. S. Tavares
Appl. Sci. 2022, 12(10), 5149; https://0-doi-org.brum.beds.ac.uk/10.3390/app12105149 - 19 May 2022
Cited by 6 | Viewed by 1716
Abstract
Traffic prediction is a vitally important keystone of an intelligent transportation system (ITS). It aims to improve travel route selection, reduce overall carbon emissions, mitigate congestion, and enhance safety. However, efficiently modelling traffic flow is challenging due to its dynamic and non-linear behaviour. [...] Read more.
Traffic prediction is a vitally important keystone of an intelligent transportation system (ITS). It aims to improve travel route selection, reduce overall carbon emissions, mitigate congestion, and enhance safety. However, efficiently modelling traffic flow is challenging due to its dynamic and non-linear behaviour. With the availability of a vast number of data samples, deep neural network-based models are best suited to solve these challenges. However, conventional network-based models lack robustness and accuracy because of their incapability to capture traffic’s spatial and temporal correlations. Besides, they usually require data from adjacent roads to achieve accurate predictions. Hence, this article presents a one-dimensional (1D) convolution neural network (CNN) and long short-term memory (LSTM)-based traffic state prediction model, which was evaluated using the Zenodo and PeMS datasets. The model used three stacked layers of 1D CNN, and LSTM with a logarithmic hyperbolic cosine loss function. The 1D CNN layers extract the features from the data, and the goodness of the LSTM is used to remember the past events to leverage them for the learnt features for traffic state prediction. A comparative performance analysis of the proposed model against support vector regression, standard LSTM, gated recurrent units (GRUs), and CNN and GRU-based models under the same conditions is also presented. The results demonstrate very encouraging performance of the proposed model, improving the mean absolute error, root mean squared error, mean percentage absolute error, and coefficient of determination scores by a mean of 16.97%, 52.1%, 54.15%, and 7.87%, respectively, relative to the baselines under comparison. Full article
(This article belongs to the Special Issue Computer Vision-Based Intelligent Systems: Challenges and Approaches)
Show Figures

Figure 1

17 pages, 2300 KiB  
Article
Bootstrapped SSL CycleGAN for Asymmetric Domain Transfer
by Lidija Krstanović, Branislav Popović, Marko Janev and Branko Brkljač
Appl. Sci. 2022, 12(7), 3411; https://0-doi-org.brum.beds.ac.uk/10.3390/app12073411 - 27 Mar 2022
Cited by 2 | Viewed by 1571
Abstract
Most CycleGAN domain transfer architectures require a large amount of data belonging to domains on which the domain transfer task is to be applied. Nevertheless, in many real-world applications one of the domains is reduced, i.e., scarce. This means that it has much [...] Read more.
Most CycleGAN domain transfer architectures require a large amount of data belonging to domains on which the domain transfer task is to be applied. Nevertheless, in many real-world applications one of the domains is reduced, i.e., scarce. This means that it has much less training data available in comparison to the other domain, which is fully observable. In order to tackle the problem of using CycleGAN framework in such unfavorable application scenarios, we propose and invoke a novel Bootstrapped SSL CycleGAN architecture (BTS-SSL), where the mentioned problem is overcome using two strategies. Firstly, by using a relatively small percentage of available labelled training data from the reduced or scarce domain and a Semi-Supervised Learning (SSL) approach, we prevent overfitting of the discriminator belonging to the reduced domain, which would otherwise occur during initial training iterations due to the small amount of available training data in the scarce domain. Secondly, after initial learning guided by the described SSL strategy, additional bootstrapping (BTS) of the reduced data domain is performed by inserting artifically generated training examples into the training poll of the data discriminator belonging to the scarce domain. Bootstrapped samples are generated by the already trained neural network that performs transferring from the fully observable to the scarce domain. The described procedure is periodically repeated during the training process several times and results in significantly improved performance of the final model in comparison to the original unsupervised CycleGAN approach. The same also holds in comparison to the solutions that are exclusively based either on the described SSL, or on the bootstrapping strategy, i.e., when these are applied separately. Moreover, in the considered scarce scenarios it also shows competitive results in comparison to the fully supervised solution based on the pix2pix method. In that sense, it is directly applicable to many domain transfer tasks that are relying on the CycleGAN architecture. Full article
(This article belongs to the Special Issue Computer Vision-Based Intelligent Systems: Challenges and Approaches)
Show Figures

Figure 1

30 pages, 42202 KiB  
Article
Automatic Understanding and Mapping of Regions in Cities Using Google Street View Images
by José Carlos Rangel, Edmanuel Cruz and Miguel Cazorla
Appl. Sci. 2022, 12(6), 2971; https://0-doi-org.brum.beds.ac.uk/10.3390/app12062971 - 14 Mar 2022
Cited by 2 | Viewed by 2082
Abstract
The use of semantic representations to achieve place understanding has been widely studied using indoor information. This kind of data can then be used for navigation, localization, and place identification using mobile devices. Nevertheless, applying this approach to outdoor data involves certain non-trivial [...] Read more.
The use of semantic representations to achieve place understanding has been widely studied using indoor information. This kind of data can then be used for navigation, localization, and place identification using mobile devices. Nevertheless, applying this approach to outdoor data involves certain non-trivial procedures, such as gathering the information. This problem can be solved by using map APIs which allow images to be taken from the dataset captured to add to the map of a city. In this paper, we seek to leverage such APIs that collect images of city streets to generate a semantic representation of the city, built using a clustering algorithm and semantic descriptors. The main contribution of this work is to provide a new approach to generate a map with semantic information for each area of the city. The proposed method can automatically assign a semantic label for the cluster on the map. This method can be useful in smart cities and autonomous driving approaches due to the categorization of the zones in a city. The results show the robustness of the proposed pipeline and the advantages of using Google Street View images, semantic descriptors, and machine learning algorithms to generate semantic maps of outdoor places. These maps properly encode the zones existing in the selected city and are able to provide new zones between current ones. Full article
(This article belongs to the Special Issue Computer Vision-Based Intelligent Systems: Challenges and Approaches)
Show Figures

Figure 1

Review

Jump to: Research

38 pages, 6937 KiB  
Review
A Review of Recent Advances and Challenges in Grocery Label Detection and Recognition
by Vânia Guimarães, Jéssica Nascimento, Paula Viana and Pedro Carvalho
Appl. Sci. 2023, 13(5), 2871; https://0-doi-org.brum.beds.ac.uk/10.3390/app13052871 - 23 Feb 2023
Viewed by 2911
Abstract
When compared with traditional local shops where the customer has a personalised service, in large retail departments, the client has to make his purchase decisions independently, mostly supported by the information available in the package. Additionally, people are becoming more aware of the [...] Read more.
When compared with traditional local shops where the customer has a personalised service, in large retail departments, the client has to make his purchase decisions independently, mostly supported by the information available in the package. Additionally, people are becoming more aware of the importance of the food ingredients and demanding about the type of products they buy and the information provided in the package, despite it often being hard to interpret. Big shops such as supermarkets have also introduced important challenges for the retailer due to the large number of different products in the store, heterogeneous affluence and the daily needs of item repositioning. In this scenario, the automatic detection and recognition of products on the shelves or off the shelves has gained increased interest as the application of these technologies may improve the shopping experience through self-assisted shopping apps and autonomous shopping, or even benefit stock management with real-time inventory, automatic shelf monitoring and product tracking. These solutions can also have an important impact on customers with visual impairments. Despite recent developments in computer vision, automatic grocery product recognition is still very challenging, with most works focusing on the detection or recognition of a small number of products, often under controlled conditions. This paper discusses the challenges related to this problem and presents a review of proposed methods for retail product label processing, with a special focus on assisted analysis for customer support, including for the visually impaired. Moreover, it details the public datasets used in this topic and identifies their limitations, and discusses future research directions of related fields. Full article
(This article belongs to the Special Issue Computer Vision-Based Intelligent Systems: Challenges and Approaches)
Show Figures

Figure 1

22 pages, 6882 KiB  
Review
Deep Anomaly Detection for In-Vehicle Monitoring—An Application-Oriented Review
by Francisco Caetano, Pedro Carvalho and Jaime Cardoso
Appl. Sci. 2022, 12(19), 10011; https://0-doi-org.brum.beds.ac.uk/10.3390/app121910011 - 05 Oct 2022
Cited by 4 | Viewed by 2095
Abstract
Anomaly detection has been an active research area for decades, with high application potential. Recent work has explored deep learning approaches to the detection of abnormal behaviour and abandoned objects in outdoor video surveillance scenarios. The extension of this recent work to in-vehicle [...] Read more.
Anomaly detection has been an active research area for decades, with high application potential. Recent work has explored deep learning approaches to the detection of abnormal behaviour and abandoned objects in outdoor video surveillance scenarios. The extension of this recent work to in-vehicle monitoring using solely visual data represents a relevant research opportunity that has been overlooked in the accessible literature. With the increasing importance of public and shared transportation for urban mobility, it becomes imperative to provide autonomous intelligent systems capable of detecting abnormal behaviour that threatens passenger safety. To investigate the applicability of current works to this scenario, a recapitulation of relevant state-of-the-art techniques and resources is presented, including available datasets for their training and benchmarking. The lack of public datasets dedicated to in-vehicle monitoring is addressed alongside other issues not considered in previous works, such as moving backgrounds and frequent illumination changes. Despite its relevance, similar surveys and reviews have disregarded this scenario and its specificities. This work initiates an important discussion on application-oriented issues, proposing solutions to be followed in future works, particularly synthetic data augmentation to achieve representative instances with the low amount of available sequences. Full article
(This article belongs to the Special Issue Computer Vision-Based Intelligent Systems: Challenges and Approaches)
Show Figures

Figure 1

49 pages, 10408 KiB  
Review
A Comprehensive Review of Computer Vision in Sports: Open Issues, Future Trends and Research Directions
by Banoth Thulasya Naik, Mohammad Farukh Hashmi and Neeraj Dhanraj Bokde
Appl. Sci. 2022, 12(9), 4429; https://0-doi-org.brum.beds.ac.uk/10.3390/app12094429 - 27 Apr 2022
Cited by 35 | Viewed by 10551
Abstract
Recent developments in video analysis of sports and computer vision techniques have achieved significant improvements to enable a variety of critical operations. To provide enhanced information, such as detailed complex analysis in sports such as soccer, basketball, cricket, and badminton, studies have focused [...] Read more.
Recent developments in video analysis of sports and computer vision techniques have achieved significant improvements to enable a variety of critical operations. To provide enhanced information, such as detailed complex analysis in sports such as soccer, basketball, cricket, and badminton, studies have focused mainly on computer vision techniques employed to carry out different tasks. This paper presents a comprehensive review of sports video analysis for various applications: high-level analysis such as detection and classification of players, tracking players or balls in sports and predicting the trajectories of players or balls, recognizing the team’s strategies, and classifying various events in sports. The paper further discusses published works in a variety of application-specific tasks related to sports and the present researcher’s views regarding them. Since there is a wide research scope in sports for deploying computer vision techniques in various sports, some of the publicly available datasets related to a particular sport have been discussed. This paper reviews detailed discussion on some of the artificial intelligence (AI) applications, GPU-based work-stations and embedded platforms in sports vision. Finally, this review identifies the research directions, probable challenges, and future trends in the area of visual recognition in sports. Full article
(This article belongs to the Special Issue Computer Vision-Based Intelligent Systems: Challenges and Approaches)
Show Figures

Figure 1

Back to TopTop