Mathematical Methods in Image Processing and Computer Vision

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Mathematics and Computer Science".

Deadline for manuscript submissions: closed (31 October 2023) | Viewed by 18970

Special Issue Editors


E-Mail Website
Guest Editor
School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
Interests: image processing; multimedia; computer vision; machine learning; representation learning

E-Mail Website
Guest Editor
Faculty of Medicine and Health, The University of Sydney, Sydney, NSW 2006, Australia
Interests: 3D registration; segmentation; detection; shape completion; pose estimation and reconstruction

Special Issue Information

Dear Colleagues,

In the past few years, image processing and computer vision tasks have achieved tremendous progress and have been applied in a wide range of practical scenes, such as retail product areas, autonomous vehicles, intelligent surveillance and MRI cancer diagnosis. The image processing and computer vision tasks mainly contain image classification, detection, denoising, retrieval or segmentation. These successes largely attribute to the application of various mathematical methods (e.g., optimization, analysis, statistics, geometry, and algebra). They provide solid theoretical support for performance improvement in specific domains, which in turn have inspired the advancement of mathematical methods to a certain degree. Mathematical models have played a critical role in modeling problems, designing algorithms and analyzing performance for image processing and computer vision tasks.

How to further apply mathematical methods into image processing and computer vision is an essential problem worth studying. This Special Issue aims to bring together researchers investigating mathematical methods to solve diverse problems of image processing and computer vision tasks. Both state-of-the-art works, as well as literature reviews, are welcome for submission. Papers addressing interesting real-world computer vision applications are especially encouraged.

Prof. Dr. Yazhou Yao
Dr. Xiaoshui Huang
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • mathematical algorithm
  • image processing
  • computer vision
  • machine learning

Published Papers (8 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

16 pages, 398 KiB  
Article
Exploring Spatial-Based Position Encoding for Image Captioning
by Xiaobao Yang, Shuai He, Junsheng Wu, Yang Yang, Zhiqiang Hou and Sugang Ma
Mathematics 2023, 11(21), 4550; https://0-doi-org.brum.beds.ac.uk/10.3390/math11214550 - 04 Nov 2023
Cited by 1 | Viewed by 913
Abstract
Image captioning has become a hot topic in artificial intelligence research and sits at the intersection of computer vision and natural language processing. Most recent imaging captioning models have adopted an “encoder + decoder” architecture, in which the encoder is employed generally to [...] Read more.
Image captioning has become a hot topic in artificial intelligence research and sits at the intersection of computer vision and natural language processing. Most recent imaging captioning models have adopted an “encoder + decoder” architecture, in which the encoder is employed generally to extract the visual feature, while the decoder generates the descriptive sentence word by word. However, the visual features need to be flattened into sequence form before being forwarded to the decoder, and this results in the loss of the 2D spatial position information of the image. This limitation is particularly pronounced in the Transformer architecture since it is inherently not position-aware. Therefore, in this paper, we propose a simple coordinate-based spatial position encoding method (CSPE) to remedy this deficiency. CSPE firstly creates the 2D position coordinates for each feature pixel, and then encodes them by row and by column separately via trainable or hard encoding, effectively strengthening the position representation of visual features and enriching the generated description sentences. In addition, in order to reduce the time cost, we also explore a diagonal-based spatial position encoding (DSPE) approach. Compared with CSPE, DSPE is slightly inferior in performance but has a faster calculation speed. Extensive experiments on the MS COCO 2014 dataset demonstrate that CSPE and DSPE can significantly enhance the spatial position representation of visual features. CSPE, in particular, demonstrates BLEU-4 and CIDEr metrics improved by 1.6% and 5.7%, respectively, compared with a baseline model without sequence-based position encoding, and also outperforms current sequence-based position encoding approaches by a significant margin. In addition, the robustness and plug-and-play ability of the proposed method are validated based on a medical captioning generation model. Full article
(This article belongs to the Special Issue Mathematical Methods in Image Processing and Computer Vision)
Show Figures

Figure 1

16 pages, 2905 KiB  
Article
Classifying Cardiac Arrhythmia from ECG Signal Using 1D CNN Deep Learning Model
by Adel A. Ahmed, Waleed Ali, Talal A. A. Abdullah and Sharaf J. Malebary
Mathematics 2023, 11(3), 562; https://0-doi-org.brum.beds.ac.uk/10.3390/math11030562 - 20 Jan 2023
Cited by 22 | Viewed by 6429
Abstract
Blood circulation depends critically on electrical activation, where any disturbance in the orderly pattern of the heart’s propagating wave of excitation can lead to arrhythmias. Diagnosis of arrhythmias using electrocardiograms (ECG) is widely used because they are a fast, inexpensive, and non-invasive tool. [...] Read more.
Blood circulation depends critically on electrical activation, where any disturbance in the orderly pattern of the heart’s propagating wave of excitation can lead to arrhythmias. Diagnosis of arrhythmias using electrocardiograms (ECG) is widely used because they are a fast, inexpensive, and non-invasive tool. However, the randomness of arrhythmic events and the susceptibility of ECGs to noise leads to misdiagnosis of arrhythmias. In addition, manually diagnosing cardiac arrhythmias using ECG data is time-intensive and error-prone. With better training, deep learning (DL) could be a better alternative for fast and automatic classification. The present study introduces a novel deep learning architecture, specifically a one-dimensional convolutional neural network (1D-CNN), for the classification of cardiac arrhythmias. The model was trained and validated with real and noise-attenuated ECG signals from the MIT-BIH dataset. The main aim is to address the limitations of traditional electrocardiograms (ECG) in the diagnosis of arrhythmias, which can be affected by noise and randomness of events, leading to misdiagnosis and errors. To evaluate the model performance, the confusion matrix is used to calculate the model accuracy, precision, recall, f1 score, average and AUC-ROC. The experiment results demonstrate that the proposed model achieved outstanding performance, with 1.00 and 0.99 accuracies in the training and testing datasets, respectively, and can be a fast and automatic alternative for the diagnosis of arrhythmias. Full article
(This article belongs to the Special Issue Mathematical Methods in Image Processing and Computer Vision)
Show Figures

Graphical abstract

14 pages, 4484 KiB  
Article
Multiple Degradation Skilled Network for Infrared and Visible Image Fusion Based on Multi-Resolution SVD Updation
by Gunnam Suryanarayana, Vijayakumar Varadarajan, Siva Ramakrishna Pillutla, Grande Nagajyothi and Ghamya Kotapati
Mathematics 2022, 10(18), 3389; https://0-doi-org.brum.beds.ac.uk/10.3390/math10183389 - 19 Sep 2022
Cited by 3 | Viewed by 1468
Abstract
Existing infrared (IR)-visible (VIS) image fusion algorithms demand source images with the same resolution levels. However, IR images are always available with poor resolution due to hardware limitations and environmental conditions. In this correspondence, we develop a novel image fusion model that brings [...] Read more.
Existing infrared (IR)-visible (VIS) image fusion algorithms demand source images with the same resolution levels. However, IR images are always available with poor resolution due to hardware limitations and environmental conditions. In this correspondence, we develop a novel image fusion model that brings resolution consistency between IR-VIS source images and generates an accurate high-resolution fused image. We train a single deep convolutional neural network model by considering true degradations in real time and reconstruct IR images. The trained multiple degradation skilled network (MDSNet) increases the prominence of objects in fused images from the IR source image. In addition, we adopt multi-resolution singular value decomposition (MRSVD) to capture maximum information from source images and update IR image coefficients with that of VIS images at the finest level. This ensures uniform contrast along with clear textural information in our results. Experiments demonstrate the efficiency of the proposed method over nine state-of-the-art methods using five image quality assessment metrics. Full article
(This article belongs to the Special Issue Mathematical Methods in Image Processing and Computer Vision)
Show Figures

Figure 1

23 pages, 18795 KiB  
Article
A New Robust and Secure 3-Level Digital Image Watermarking Method Based on G-BAT Hybrid Optimization
by Kilari Jyothsna Devi, Priyanka Singh, Jatindra Kumar Dash, Hiren Kumar Thakkar, José Santamaría, Musalreddy Venkata Jayanth Krishna and Antonio Romero-Manchado
Mathematics 2022, 10(16), 3015; https://0-doi-org.brum.beds.ac.uk/10.3390/math10163015 - 21 Aug 2022
Cited by 9 | Viewed by 1851
Abstract
This contribution applies tools from the information theory and soft computing (SC) paradigms to the embedding and extraction of watermarks in aerial remote sensing (RS) images to protect copyright. By the time 5G came along, Internet usage had already grown exponentially. Regarding copyright [...] Read more.
This contribution applies tools from the information theory and soft computing (SC) paradigms to the embedding and extraction of watermarks in aerial remote sensing (RS) images to protect copyright. By the time 5G came along, Internet usage had already grown exponentially. Regarding copyright protection, the most important responsibility of the digital image watermarking (DIW) approach is to provide authentication and security for digital content. In this paper, our main goal is to provide authentication and security to aerial RS images transmitted over the Internet by the proposal of a hybrid approach using both the redundant discrete wavelet transform (RDWT) and the singular value decomposition (SVD) schemes for DIW. Specifically, SC is adopted in this work for the numerical optimization of critical parameters. Moreover, 1-level RDWT and SVD are applied on digital cover image and singular matrices of LH and HL sub-bands are selected for watermark embedding. Further selected singular matrices SLH and SHL are split into 3×3 non-overlapping blocks, and diagonal positions are used for watermark embedding. Three-level symmetric encryption with low computational cost is used to ensure higher watermark security. A hybrid grasshopper–BAT (G-BAT) SC-based optimization algorithm is also proposed in order to achieve high quality DIW outcomes, and a broad comparison against other methods in the state-of-the-art is provided. The experimental results have demonstrated that our proposal provides high levels of imperceptibility, robustness, embedding capacity and security when dealing with DIW of aerial RS images, even higher than the state-of-the-art methods. Full article
(This article belongs to the Special Issue Mathematical Methods in Image Processing and Computer Vision)
Show Figures

Figure 1

17 pages, 5464 KiB  
Article
Dynamic Re-Weighting and Cross-Camera Learning for Unsupervised Person Re-Identification
by Qingze Yin, Guan’an Wang, Jinlin Wu, Haonan Luo and Zhenmin Tang
Mathematics 2022, 10(10), 1654; https://0-doi-org.brum.beds.ac.uk/10.3390/math10101654 - 12 May 2022
Cited by 2 | Viewed by 1222
Abstract
Person Re-Identification (ReID) has witnessed tremendous improvements with the help of deep convolutional neural networks (CNN). Nevertheless, because different fields have their characteristics, most existing methods encounter the problem of poor generalization ability to invisible people. To address this problem, based on the [...] Read more.
Person Re-Identification (ReID) has witnessed tremendous improvements with the help of deep convolutional neural networks (CNN). Nevertheless, because different fields have their characteristics, most existing methods encounter the problem of poor generalization ability to invisible people. To address this problem, based on the relationship between the temporal and camera position, we propose a robust and effective training strategy named temporal smoothing dynamic re-weighting and cross-camera learning (TSDRC). It uses robust and effective algorithms to transfer valuable knowledge of existing labeled source domains to unlabeled target domains. In the target domain training stage, TSDRC iteratively clusters the samples into several centers and dynamically re-weights unlabeled samples from each center with a temporal smoothing score. Then, cross-camera triplet loss is proposed to fine-tune the source domain model. Particularly, to improve the discernibility of CNN models in the source domain, generally shared person attributes and margin-based softmax loss are adapted to train the source model. In terms of the unlabeled target domain, the samples are clustered into several centers iteratively and the unlabeled samples are dynamically re-weighted from each center. Then, cross-camera triplet loss is proposed to fine-tune the source domain model. Comprehensive experiments on the Market-1501 and DukeMTMC-reID datasets demonstrate that the proposed method vastly improves the performance of unsupervised domain adaptability. Full article
(This article belongs to the Special Issue Mathematical Methods in Image Processing and Computer Vision)
Show Figures

Figure 1

14 pages, 6895 KiB  
Article
A Traffic Event Detection Method Based on Random Forest and Permutation Importance
by Ziyi Su, Qingchao Liu, Chunxia Zhao and Fengming Sun
Mathematics 2022, 10(6), 873; https://0-doi-org.brum.beds.ac.uk/10.3390/math10060873 - 09 Mar 2022
Cited by 5 | Viewed by 1960
Abstract
Although the video surveillance system plays an important role in intelligent transportation, the limited camera views make it difficult to observe many traffic events. In this paper, we collect and combine the traffic flow variables from the multi-source sensors, and propose a PITED [...] Read more.
Although the video surveillance system plays an important role in intelligent transportation, the limited camera views make it difficult to observe many traffic events. In this paper, we collect and combine the traffic flow variables from the multi-source sensors, and propose a PITED method based on Random Forest (RF) and Permutation importance (PI) for traffic event detection. This model selects the suitable traffic flow variables by means of permutation arrangement of importance, and establishes the whole process of acquisition, preprocessing, quantization, modeling and evaluation. Moreover, the real traffic data are collected and tested in this paper for evaluating the experiment performance, including the miss/false rate of traffic event, and average detection time. The experimental results show that the detection rate is more than 85% and the false alarm rate is less than 3%. It means the model is effective and efficient in the practical application regardless of both workdays and holidays. Full article
(This article belongs to the Special Issue Mathematical Methods in Image Processing and Computer Vision)
Show Figures

Figure 1

20 pages, 1888 KiB  
Article
Single-Sample Face Recognition Based on Shared Generative Adversarial Network
by Yuhua Ding, Zhenmin Tang and Fei Wang
Mathematics 2022, 10(5), 752; https://0-doi-org.brum.beds.ac.uk/10.3390/math10050752 - 26 Feb 2022
Cited by 4 | Viewed by 1509
Abstract
Single-sample face recognition is a very challenging problem, where each person has only one labeled training sample. It is difficult to describe unknown facial variations. In this paper, we propose a shared generative adversarial network (SharedGAN) to expand the gallery dataset. Benefiting from [...] Read more.
Single-sample face recognition is a very challenging problem, where each person has only one labeled training sample. It is difficult to describe unknown facial variations. In this paper, we propose a shared generative adversarial network (SharedGAN) to expand the gallery dataset. Benefiting from the shared decoding network, SharedGAN requires only a small number of training samples. After obtaining the generated samples, we join them into a large public dataset. Then, a deep convolutional neural network is trained on the new dataset. We use the well-trained model for feature extraction. With the deep convolutional features, a simple softmax classifier is trained. Our method has been evaluated on AR, CMU-PIE, and FERET datasets. Experimental results demonstrate the effectiveness of SharedGAN and show its robustness for single sample face recognition. Full article
(This article belongs to the Special Issue Mathematical Methods in Image Processing and Computer Vision)
Show Figures

Figure 1

18 pages, 2006 KiB  
Article
Age-Invariant Adversarial Feature Learning for Kinship Verification
by Fan Liu, Zewen Li, Wenjie Yang and Feng Xu
Mathematics 2022, 10(3), 480; https://0-doi-org.brum.beds.ac.uk/10.3390/math10030480 - 02 Feb 2022
Cited by 3 | Viewed by 1722
Abstract
Kinship verification aims to determine whether two given persons are blood relatives. This technique can be leveraged in many real-world scenarios, such as finding missing people, identification of kinship in forensic medicine, and certain types of interdisciplinary research. Most existing methods extract facial [...] Read more.
Kinship verification aims to determine whether two given persons are blood relatives. This technique can be leveraged in many real-world scenarios, such as finding missing people, identification of kinship in forensic medicine, and certain types of interdisciplinary research. Most existing methods extract facial features directly from given images and examine the full set of features to verify kinship. However, most approaches are easily affected by the age gap among faces, with few methods taking age into account. This paper accordingly proposes an Age-Invariant Adversarial Feature learning module (AIAF), which is capable of factoring in full facial features to create two uncorrelated components, i.e., identity-related features and age-related features. More specifically, we harness a type of adversarial mechanism to make the correlation between these two components as small as possible. Moreover, to pay different attention to identity-related features, we present an Identity Feature Weighted module (IFW). Only purified identity features are fed into the IFW module, which can assign different weights to the features according to their importance in the kinship verification task. Experimental results on three public popular datasets demonstrate that our approach is able to capture useful age-invariant features, i.e., identity features, and achieve significant improvements compared with other state-of-the-art methods on both small-scale and large-scale datasets. Full article
(This article belongs to the Special Issue Mathematical Methods in Image Processing and Computer Vision)
Show Figures

Figure 1

Back to TopTop