Recent Advances of Learning Based Intelligent Vision System

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (20 June 2023) | Viewed by 5001

Special Issue Editor

School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
Interests: image/video quality assessment; perceptual modeling and processing
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

In recent years, the study of intelligent vision system (IVS) has attracted an increasing interest due to its wide applications in various real-world tasks, such as automatic driving, defect inspection, auxiliary diagnosis, and so on. Although many popular deep-learning approaches have achieved impressive success, it is still a great challenge to build robust IVS due to various practical limitations, including inevitable image/video distortions, limited computational/storage resources, and insufficient annotations. Fortunately, the recent development of various advanced machine learning approaches helps to equip us in facing these challenges, which include the transformer, meta-learning, graph neural network, incremental learning, transfer learning, self-learning, causal learning, etc.

This Special Issue aims to supply a timely and thorough collection of high-quality contributions that explore emerging machine learning approaches in addressing IVS-related tasks. The original submissions presenting new insights, frameworks, and databases are welcome.

Topics of interest include, but are not limited to:

  • Image/video representation learning;
  • Learning for visual data compression;
  • Learning to evaluate image/video quality;
  • Learning for visual data enhancement;
  • Learning for image/video retrieval;
  • Learning for image/video recognition;
  • Learning for image/video reconstruction;
  • Learning for object tracking;
  • New image/video database for advanced machine learning.

Dr. Qingbo Wu
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • image/video representation
  • image/video compression
  • image/video quality analysis
  • image/video enhancement
  • image/video retrieval
  • image/video recognition
  • image/video database

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

17 pages, 3602 KiB  
Article
Static Video Summarization Using Video Coding Features with Frame-Level Temporal Subsampling and Deep Learning
by Obada Issa and Tamer Shanableh
Appl. Sci. 2023, 13(10), 6065; https://doi.org/10.3390/app13106065 - 15 May 2023
Cited by 10 | Viewed by 1061
Abstract
There is an abundance of digital video content due to the cloud’s phenomenal growth and security footage; it is therefore essential to summarize these videos in data centers. This paper offers innovative approaches to the problem of key frame extraction for the purpose [...] Read more.
There is an abundance of digital video content due to the cloud’s phenomenal growth and security footage; it is therefore essential to summarize these videos in data centers. This paper offers innovative approaches to the problem of key frame extraction for the purpose of video summarization. Our approach includes the extraction of feature variables from the bit streams of coded videos, followed by optional stepwise regression for dimensionality reduction. Once the features are extracted and their dimensionality is reduced, we apply innovative frame-level temporal subsampling techniques, followed by training and testing using deep learning architectures. The frame-level temporal subsampling techniques are based on cosine similarity and the PCA projections of feature vectors. We create three different learning architectures by utilizing LSTM networks, 1D-CNN networks, and random forests. The four most popular video summarization datasets, namely, TVSum, SumMe, OVP, and VSUMM, are used to evaluate the accuracy of the proposed solutions. This includes the precision, recall, F-score measures, and computational time. It is shown that the proposed solutions, when trained and tested on all subjective user summaries, achieved F-scores of 0.79, 0.74, 0.88, and 0.81, respectively, for the aforementioned datasets, showing clear improvements over prior studies. Full article
(This article belongs to the Special Issue Recent Advances of Learning Based Intelligent Vision System)
Show Figures

Figure 1

21 pages, 3074 KiB  
Article
Language Bias-Driven Self-Knowledge Distillation with Generalization Uncertainty for Reducing Language Bias in Visual Question Answering
by Desen Yuan, Lei Wang, Qingbo Wu, Fanman Meng, King Ngi Ngan and Linfeng Xu
Appl. Sci. 2022, 12(15), 7588; https://0-doi-org.brum.beds.ac.uk/10.3390/app12157588 - 28 Jul 2022
Cited by 2 | Viewed by 1545
Abstract
To answer questions, visual question answering systems (VQA) rely on language bias but ignore the information of the images, which has negative information on its generalization. The mainstream debiased methods focus on removing language prior to inferring. However, the image samples are distributed [...] Read more.
To answer questions, visual question answering systems (VQA) rely on language bias but ignore the information of the images, which has negative information on its generalization. The mainstream debiased methods focus on removing language prior to inferring. However, the image samples are distributed unevenly in the dataset, so the feature sets acquired by the model often cannot cover the features (views) of the tail samples. Therefore, language bias occurs. This paper proposes a language bias-driven self-knowledge distillation framework to implicitly learn the feature sets of multi-views so as to reduce language bias. Moreover, to measure the performance of student models, the authors of this paper use a generalization uncertainty index to help student models learn unbiased visual knowledge and force them to focus more on the questions that cannot be answered based on language bias alone. In addition, the authors of this paper analyze the theory of the proposed method and verify the positive correlation between generalization uncertainty and expected test error. The authors of this paper validate the method’s effectiveness on the VQA-CP v2, VQA-CP v1 and VQA v2 datasets through extensive ablation experiments. Full article
(This article belongs to the Special Issue Recent Advances of Learning Based Intelligent Vision System)
Show Figures

Figure 1

11 pages, 951 KiB  
Article
Medium Transmission Map Matters for Learning to Restore Real-World Underwater Images
by Kai Yan, Lanyue Liang, Ziqiang Zheng, Guoqing Wang and Yang Yang
Appl. Sci. 2022, 12(11), 5420; https://0-doi-org.brum.beds.ac.uk/10.3390/app12115420 - 27 May 2022
Cited by 8 | Viewed by 1568
Abstract
Low illumination, light reflections, scattering, absorption, and suspended particles inevitably lead to critically degraded underwater image quality, which poses great challenges for recognizing objects from underwater images. The existing underwater enhancement methods that aim to promote underwater visibility heavily suffer from poor image [...] Read more.
Low illumination, light reflections, scattering, absorption, and suspended particles inevitably lead to critically degraded underwater image quality, which poses great challenges for recognizing objects from underwater images. The existing underwater enhancement methods that aim to promote underwater visibility heavily suffer from poor image restoration performance and generalization ability. To reduce the difficulty of underwater image enhancement, we introduce the media transmission map as guidance for image enhancement. Different from the existing frameworks, which also introduce the medium transmission map for better distribution modeling, we formulate the interaction between the underwater visual images and the transmission map explicitly to obtain better enhancement results. At the same time, our network only requires supervised learning of the media transmission map during training, and the corresponding prediction map can be generated in subsequent tests, which reduces the operation difficulty of subsequent tasks. Thanks to our formulation, the proposed method with a very lightweight network configuration can produce very promising results of 22.6 dB on the challenging Test-R90 with an impressive 30.3 FPS, which is faster than most current algorithms. Comprehensive experimental results have demonstrated the superiority on underwater perception. Full article
(This article belongs to the Special Issue Recent Advances of Learning Based Intelligent Vision System)
Show Figures

Figure 1

Back to TopTop