Advances and Application of Intelligent Video Surveillance System

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (20 July 2023) | Viewed by 27921

Special Issue Editors


E-Mail Website
Guest Editor
School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China
Interests: video compression; video processing; image quality assessment
Special Issues, Collections and Topics in MDPI journals
School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China
Interests: image/video processing; computer vision; artificial intelligence
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
School of Computer and Software, Nanjing University of Information Science and Technology, Nanjing 210044, China
Interests: video processing; computer vision; machine learning
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Video surveillance can be used to witness or observe a scene and look for specific behaviors in daily life, and it has been widely used for public security, transportation monitoring, and so on. A video surveillance system consists of three parts: a camera system, a transmission system, and an observation system.   
The camera system aims to capture surveillance videos; however, due to the increased video resolution, the volume of video data has also increased significantly, becoming a challenge for storage, transmission, and processing.

In this Special Issue, we would like to highlight new and innovative work focused on intelligent video surveillance systems. We invite you to present high-quality research in one or more areas revolving around the current state of the art and future of intelligent video surveillance systems in practice and theory. In addition, quality manuscripts on surveillance video compression, surveillance video processing, and surveillance video security are of course very welcome.

Prof. Dr. Zhaoqing Pan
Dr. Bo Peng
Prof. Dr. Jinwei Wang
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • surveillance video compression
  • surveillance video processing
  • surveillance video transmission
  • surveillance video security

Published Papers (13 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

18 pages, 13848 KiB  
Article
Image Structure-Induced Semantic Pyramid Network for Inpainting
by Rong Huang and Yuhui Zheng
Appl. Sci. 2023, 13(13), 7812; https://0-doi-org.brum.beds.ac.uk/10.3390/app13137812 - 03 Jul 2023
Viewed by 695
Abstract
The existing deep-learning-based image inpainting algorithms often suffer from local structure disconnections and blurring when dealing with large irregular defective images. To solve these problems, an image structure-induced semantic pyramid network for inpainting is proposed. The model consists of two parts: the edge [...] Read more.
The existing deep-learning-based image inpainting algorithms often suffer from local structure disconnections and blurring when dealing with large irregular defective images. To solve these problems, an image structure-induced semantic pyramid network for inpainting is proposed. The model consists of two parts: the edge inpainting network and the content-filling network. U-Net-based edge inpainting network restores the edge of the image defect with residual blocks. The edge inpainting map is input into the pyramid content-filling network together with the image in the prior condition. In the content-filling network, the attention transfer module (ATM) is designed to reconfigure the encoding features of each scale step by step, and the recovered feature map is linked to the decoding layer and the corresponding potential feature fusion decoding to improve the global consistency of the image and finally obtain the restored image. The quantitative analysis shows that the average L1 loss is reduced by about 1.14%, the peak signal-to-noise ratio (PSNR) is improved by about 3.51, and the structural similarity (SSIM) is improved by about 0.163 on the CelebA-HQ and Places2 datasets compared with the current mainstream algorithms. The qualitative analysis shows that this model not only generates semantically sound content as a whole but also better matches the human visual perception in terms of local structural connectivity and texture synthesis. Full article
(This article belongs to the Special Issue Advances and Application of Intelligent Video Surveillance System)
Show Figures

Figure 1

19 pages, 3912 KiB  
Article
Learnable Nonlocal Contrastive Network for Single Image Super-Resolution
by Binbin Xu and Yuhui Zheng
Appl. Sci. 2023, 13(12), 7160; https://0-doi-org.brum.beds.ac.uk/10.3390/app13127160 - 15 Jun 2023
Viewed by 621
Abstract
Single image super-resolution (SISR) aims to recover a high-resolution image from a single low-resolution image. In recent years, SISR methods based on deep convolutional neural networks have achieved remarkable success, and some methods further improve the performance of the SISR model by introducing [...] Read more.
Single image super-resolution (SISR) aims to recover a high-resolution image from a single low-resolution image. In recent years, SISR methods based on deep convolutional neural networks have achieved remarkable success, and some methods further improve the performance of the SISR model by introducing nonlocal attention into the model. However, most SISR methods that introduce nonlocal attention focus on more complex attention mechanisms and only use fixed functions for measurement when exploring image similarity. In addition, the model penalizes the algorithm in terms of loss when the output predicted by the model does not match the target data, even if this output is a potentially valid solution. To this end, we propose learnable nonlocal contrastive attention (LNLCA), which flexibly aggregates image features while maintaining linear computational complexity. Then, we introduce the adaptive target generator (ATG) model to address the problem of the single model training mode. Based on LNLCA, we construct a learnable nonlocal contrastive network (LNLCN). The experimental results demonstrate the effectiveness of the algorithm, which produces reconstructed images with more natural texture details. Full article
(This article belongs to the Special Issue Advances and Application of Intelligent Video Surveillance System)
Show Figures

Figure 1

19 pages, 2161 KiB  
Article
An Efficient Boosting-Based Windows Malware Family Classification System Using Multi-Features Fusion
by Zhiguo Chen and Xuanyu Ren
Appl. Sci. 2023, 13(6), 4060; https://0-doi-org.brum.beds.ac.uk/10.3390/app13064060 - 22 Mar 2023
Cited by 2 | Viewed by 2176
Abstract
In previous years, cybercriminals have utilized various strategies to evade identification, including obfuscation, confusion, and polymorphism technology, resulting in an exponential increase in the amount of malware that poses a serious threat to computer security. The use of techniques such as code reuse, [...] Read more.
In previous years, cybercriminals have utilized various strategies to evade identification, including obfuscation, confusion, and polymorphism technology, resulting in an exponential increase in the amount of malware that poses a serious threat to computer security. The use of techniques such as code reuse, automation, etc., also makes it more arduous to identify variant software in malware families. To effectively detect the families to which malware belongs, this paper proposed and discussed a new malware fusion feature set and classification system based on the BIG2015 dataset. We used a forward feature stepwise selection technique to combine plausible binary and assembly malware features to produce new and efficient fused features. A number of machine-learning techniques, including extreme gradient boosting (XGBoost), random forest, support vector machine (SVM), K-nearest neighbors (KNN), and adaptive boosting (AdaBoost), are used to confirm the effectiveness of the fusion feature set and malware classification system. The experimental findings demonstrate that the XGBoost algorithm’s classification accuracy on the fusion feature set suggested in this paper can reach 99.87%. In addition, we applied tree-boosting-based LightGBM and CatBoost algorithms to the domain of malware classification for the first time. On our fusion feature set, the corresponding classification accuracy can reach 99.84% and 99.76%, respectively, and the F1-scores can achieve 99.66% and 99.28%, respectively. Full article
(This article belongs to the Special Issue Advances and Application of Intelligent Video Surveillance System)
Show Figures

Figure 1

20 pages, 5880 KiB  
Article
An Efficient Attention-Based Convolutional Neural Network That Reduces the Effects of Spectral Variability for Hyperspectral Unmixing
by Baohua Jin, Yunfei Zhu, Wei Huang, Qiqiang Chen and Sijia Li
Appl. Sci. 2022, 12(23), 12158; https://0-doi-org.brum.beds.ac.uk/10.3390/app122312158 - 28 Nov 2022
Viewed by 1454
Abstract
The purpose of hyperspectral unmixing (HU) is to obtain the spectral features of materials (endmembers) and their proportion (abundance) in a hyperspectral image (HSI). Due to the existence of spectral variabilities (SVs), it is difficult to obtain accurate spectral features. At the same [...] Read more.
The purpose of hyperspectral unmixing (HU) is to obtain the spectral features of materials (endmembers) and their proportion (abundance) in a hyperspectral image (HSI). Due to the existence of spectral variabilities (SVs), it is difficult to obtain accurate spectral features. At the same time, the performance of unmixing is not only affected by SVs but also depends on the effective spectral and spatial information. To solve these problems, this study proposed an efficient attention-based convolutional neural network (EACNN) and an efficient convolution block attention module (ECBAM). The EACNN is a two-stream network, which is learned from nearly pure endmembers through an additional network, and the aggregated spectral and spatial information can be obtained effectively with the help of the ECBAM, which can reduce the influence of SVs and improve the performance. The unmixing network helps the whole network to pay attention to meaningful feature information by using efficient channel attention (ECA) and guides the unmixing process by sharing parameters. Experimental results on three HSI datasets showed that the method proposed in this study outperformed other unmixing methods. Full article
(This article belongs to the Special Issue Advances and Application of Intelligent Video Surveillance System)
Show Figures

Figure 1

11 pages, 2173 KiB  
Article
Robust Classification Model for Diabetic Retinopathy Based on the Contrastive Learning Method with a Convolutional Neural Network
by Xinxing Feng, Shuai Zhang, Long Xu, Xin Huang and Yanyan Chen
Appl. Sci. 2022, 12(23), 12071; https://0-doi-org.brum.beds.ac.uk/10.3390/app122312071 - 25 Nov 2022
Cited by 1 | Viewed by 1164
Abstract
Diabetic retinopathy is one of the most common microvascular complications of diabetes. Early detection and treatment can effectively reduce the risk. Hence, a robust computer-aided diagnosis model is important. Based on the labeled fundus images, we build a binary classification model based on [...] Read more.
Diabetic retinopathy is one of the most common microvascular complications of diabetes. Early detection and treatment can effectively reduce the risk. Hence, a robust computer-aided diagnosis model is important. Based on the labeled fundus images, we build a binary classification model based on ResNet-18 and transfer learning and, more importantly, improve the robustness of the model through supervised contrastive learning. The model is tested with different learning rates and data augmentation methods. The standard deviations of the multiple test results decrease from 4.11 to 0.15 for different learning rates and from 1.53 to 0.18 for different data augmentation methods. In addition, the supervised contrastive learning method also improves the average accuracy of the model, which increases from 80.7% to 86.5%. Full article
(This article belongs to the Special Issue Advances and Application of Intelligent Video Surveillance System)
Show Figures

Figure 1

14 pages, 379 KiB  
Article
Perception of Risks and Usefulness of Smart Video Surveillance Systems
by Thomas Golda, Deborah Guaia and Verena Wagner-Hartl
Appl. Sci. 2022, 12(20), 10435; https://0-doi-org.brum.beds.ac.uk/10.3390/app122010435 - 16 Oct 2022
Cited by 5 | Viewed by 1948
Abstract
The number of video cameras in public places increases due to different reasons such as detecting dangers (e.g., thefts, robberies, terrorist attacks) and security breaches in crowds. The application of video surveillance systems is sometimes evaluated ambivalently; therefore, the presented study focuses on [...] Read more.
The number of video cameras in public places increases due to different reasons such as detecting dangers (e.g., thefts, robberies, terrorist attacks) and security breaches in crowds. The application of video surveillance systems is sometimes evaluated ambivalently; therefore, the presented study focuses on factors influencing the acceptance of a privacy-friendly, smart video surveillance system. Overall, 216 persons aged between 18 and 81 years participated in an online survey. In terms of the perceived usefulness, there are significant interactions of public spaces × gender and public spaces × time of day. In addition, the assessment of different privacy levels of a video surveillance system differ significantly in terms of perceived risk. Interestingly, men rate the risk concerning their own privacy significantly higher than women do. Participants rate the presented system as fairly useful and slightly risky for their own privacy. The findings of the presented exploratory study provide insight into how people perceive smart video surveillance. These findings have the potential to support the conditions of the use of smart video surveillance systems and to address the possibly affected individuals. Full article
(This article belongs to the Special Issue Advances and Application of Intelligent Video Surveillance System)
Show Figures

Figure 1

12 pages, 3102 KiB  
Article
A System Architecture of a Fusion System for Multiple LiDARs Image Processing
by Minwoo Jung, Dae-Young Kim and Seokhoon Kim
Appl. Sci. 2022, 12(19), 9421; https://0-doi-org.brum.beds.ac.uk/10.3390/app12199421 - 20 Sep 2022
Cited by 2 | Viewed by 1448
Abstract
LiDAR sensors are extensively used in autonomous vehicles and their optimal use is a critical concern. In this paper, we propose an embedded software architecture for multiple LiDAR sensors, i.e., a fusion system that acts as an embedded system for processing data from [...] Read more.
LiDAR sensors are extensively used in autonomous vehicles and their optimal use is a critical concern. In this paper, we propose an embedded software architecture for multiple LiDAR sensors, i.e., a fusion system that acts as an embedded system for processing data from multiple LiDAR sensors. The fusion system software comprises multiple clients and a single server. The client and server are connected through inter-process communication. Multiple clients create processes to process the data from each LiDAR sensor via a multiprocessing method. Our approach involves a scheduling method for efficient multiprocessing. The server uses multithreading to optimize the internal functions. For internal communication within the fusion system, multiple clients and a single server are connected using the socket method. In sequential processing, the response time increases in proportion to the number of connected LiDAR sensors. By contrast, in the proposed software architecture, the response time decreases in inverse proportion to the number of LiDAR sensors. As LiDAR sensors become increasingly popular in the field of autonomous driving, the results of this study can be expected to make a substantial contribution to technology development in this domain. Full article
(This article belongs to the Special Issue Advances and Application of Intelligent Video Surveillance System)
Show Figures

Figure 1

22 pages, 8120 KiB  
Article
Complementary Segmentation of Primary Video Objects with Reversible Flows
by Junjie Wu, Jia Li and Long Xu
Appl. Sci. 2022, 12(15), 7781; https://0-doi-org.brum.beds.ac.uk/10.3390/app12157781 - 02 Aug 2022
Viewed by 1018
Abstract
Segmenting primary objects in a video is an important yet challenging problem in intelligent video surveillance, as it exhibits various levels of foreground/background ambiguities. To reduce such ambiguities, we propose a novel formulation via exploiting foreground and background context as well as their [...] Read more.
Segmenting primary objects in a video is an important yet challenging problem in intelligent video surveillance, as it exhibits various levels of foreground/background ambiguities. To reduce such ambiguities, we propose a novel formulation via exploiting foreground and background context as well as their complementary constraint. Under this formulation, a unified objective function is further defined to encode each cue. For implementation, we design a complementary segmentation network (CSNet) with two separate branches, which can simultaneously encode the foreground and background information along with joint spatial constraints. The CSNet is trained on massive images with manually annotated salient objects in an end-to-end manner. By applying CSNet on each video frame, the spatial foreground and background maps can be initialized. To enforce temporal consistency effectively and efficiently, we divide each frame into superpixels and construct a neighborhood reversible flow that reflects the most reliable temporal correspondences between superpixels in far-away frames. With such a flow, the initialized foregroundness and backgroundness can be propagated along the temporal dimension so that primary video objects gradually pop out and distractors are well suppressed. Extensive experimental results on three video datasets show that the proposed approach achieves impressive performance in comparisons with 22 state-of-the-art models. Full article
(This article belongs to the Special Issue Advances and Application of Intelligent Video Surveillance System)
Show Figures

Figure 1

13 pages, 780 KiB  
Article
Multi-Modal 3D Shape Clustering with Dual Contrastive Learning
by Guoting Lin, Zexun Zheng, Lin Chen, Tianyi Qin and Jiahui Song
Appl. Sci. 2022, 12(15), 7384; https://0-doi-org.brum.beds.ac.uk/10.3390/app12157384 - 22 Jul 2022
Cited by 3 | Viewed by 1732
Abstract
3D shape clustering is developing into an important research subject with the wide applications of 3D shapes in computer vision and multimedia fields. Since 3D shapes generally take on various modalities, how to comprehensively exploit the multi-modal properties to boost clustering performance has [...] Read more.
3D shape clustering is developing into an important research subject with the wide applications of 3D shapes in computer vision and multimedia fields. Since 3D shapes generally take on various modalities, how to comprehensively exploit the multi-modal properties to boost clustering performance has become a key issue for the 3D shape clustering task. Taking into account the advantages of multiple views and point clouds, this paper proposes the first multi-modal 3D shape clustering method, named the dual contrastive learning network (DCL-Net), to discover the clustering partitions of unlabeled 3D shapes. First, by simultaneously performing cross-view contrastive learning within multi-view modality and cross-modal contrastive learning between the point cloud and multi-view modalities in the representation space, a representation-level dual contrastive learning module is developed, which aims to capture discriminative 3D shape features for clustering. Meanwhile, an assignment-level dual contrastive learning module is designed by further ensuring the consistency of clustering assignments within the multi-view modality, as well as between the point cloud and multi-view modalities, thus obtaining more compact clustering partitions. Experiments on two commonly used 3D shape benchmarks demonstrate the effectiveness of the proposed DCL-Net. Full article
(This article belongs to the Special Issue Advances and Application of Intelligent Video Surveillance System)
Show Figures

Figure 1

18 pages, 2375 KiB  
Article
A Novel Generative Model for Face Privacy Protection in Video Surveillance with Utility Maintenance
by Yuying Qiu, Zhiyi Niu, Biao Song, Tinghuai Ma, Abdullah Al-Dhelaan and Mohammed Al-Dhelaan
Appl. Sci. 2022, 12(14), 6962; https://0-doi-org.brum.beds.ac.uk/10.3390/app12146962 - 09 Jul 2022
Cited by 10 | Viewed by 2293
Abstract
In recent years, the security and privacy issues of face data in video surveillance have become one of the hotspots. How to protect privacy while maintaining the utility of monitored faces is a challenging problem. At present, most of the mainstream methods are [...] Read more.
In recent years, the security and privacy issues of face data in video surveillance have become one of the hotspots. How to protect privacy while maintaining the utility of monitored faces is a challenging problem. At present, most of the mainstream methods are suitable for maintaining data utility with respect to pre-defined criteria such as the structure similarity or shape of the face, which bears the criticism of poor versatility and adaptability. This paper proposes a novel generative framework called Quality Maintenance-Variational AutoEncoder (QM-VAE), which takes full advantage of existing privacy protection technologies. We innovatively add the loss of service quality to the loss function to ensure the generation of de-identified face images with guided quality preservation. The proposed model automatically adjusts the generated image according to the different service quality evaluators, so it is generic and efficient in different service scenarios, even some that have nothing to do with simple visual effects. We take facial expression recognition as an example to present experiments on the dataset CelebA to demonstrate the utility-preservation capabilities of QM-VAE. The experimental data show that QM-VAE has the highest quality retention rate of 86%. Compared with the existing method, QM-VAE generates de-identified face images with significantly improved utility and increases the effect by 6.7%. Full article
(This article belongs to the Special Issue Advances and Application of Intelligent Video Surveillance System)
Show Figures

Figure 1

20 pages, 4212 KiB  
Article
Mixed-Flow Load-Balanced Scheduling for Software-Defined Networks in Intelligent Video Surveillance Cloud Data Center
by Biao Song, Yue Chang, Xinchang Zhang, Abdullah Al-Dhelaan and Mohammed Al-Dhelaan
Appl. Sci. 2022, 12(13), 6475; https://0-doi-org.brum.beds.ac.uk/10.3390/app12136475 - 26 Jun 2022
Cited by 2 | Viewed by 1210
Abstract
As the large amount of video surveillance data floods into cloud data center, achieving load balancing in a cloud network has become a challenging problem. Meanwhile, we hope the cloud data center maintains low latency, low consumption, and high throughput performance when transmitting [...] Read more.
As the large amount of video surveillance data floods into cloud data center, achieving load balancing in a cloud network has become a challenging problem. Meanwhile, we hope the cloud data center maintains low latency, low consumption, and high throughput performance when transmitting massive amounts of data. OpenFlow enables a software-defined solution through programing to control the scheduling of data flow in the cloud data center. However, the existing scheduling algorithm of the data center cannot cope with the congestion of the network center effectively. Even for some dynamic scheduling algorithms, adjustments can only be made after congestion occurs. Hence, we propose a proactive and dynamically adjusted mixed-flow load-balanced scheduling (MFLBS) algorithm, which not only takes into account the different sizes of flows in the network but also maintains maximum throughput while balancing the load. In this paper, the MFLBS problem was formulated, along with a set of heuristic algorithms for real-time feedback and adjustment. Experiments with mesh and tree network models show that our MFLBS is significantly better than other dynamic scheduling algorithms, including one-hop DLBS and static scheduling algorithm FCFS. The MFLBS algorithm can effectively reduce the delay of small flows and average delay while maintaining high throughput. Full article
(This article belongs to the Special Issue Advances and Application of Intelligent Video Surveillance System)
Show Figures

Figure 1

15 pages, 2881 KiB  
Article
Faster MDNet for Visual Object Tracking
by Qianqian Yu, Keqi Fan, Yiyang Wang and Yuhui Zheng
Appl. Sci. 2022, 12(5), 2336; https://0-doi-org.brum.beds.ac.uk/10.3390/app12052336 - 23 Feb 2022
Cited by 4 | Viewed by 1651
Abstract
With the rapid development of deep learning techniques, new breakthroughs have been made in deep learning-based object tracking methods. Although many approaches have achieved state-of-the-art results, existing methods still cannot fully satisfy practical needs. A robust tracker should perform well in three aspects: [...] Read more.
With the rapid development of deep learning techniques, new breakthroughs have been made in deep learning-based object tracking methods. Although many approaches have achieved state-of-the-art results, existing methods still cannot fully satisfy practical needs. A robust tracker should perform well in three aspects: tracking accuracy, speed, and resource consumption. Considering this notion, we propose a novel model, Faster MDNet, to strike a better balance among these factors. To improve the tracking accuracy, a channel attention module is introduced to our method. We also design domain adaptation components to obtain more generic features. Simultaneously, we implement an adaptive, spatial pyramid pooling layer for reducing model complexity and accelerating the tracking speed. The experiments illustrate the promising performance of our tracker on OTB100, VOT2018, TrackingNet, UAV123, and NfS. Full article
(This article belongs to the Special Issue Advances and Application of Intelligent Video Surveillance System)
Show Figures

Figure 1

Review

Jump to: Research

14 pages, 885 KiB  
Review
Sentiment Analysis of Twitter Data
by Yili Wang, Jiaxuan Guo, Chengsheng Yuan and Baozhu Li
Appl. Sci. 2022, 12(22), 11775; https://0-doi-org.brum.beds.ac.uk/10.3390/app122211775 - 19 Nov 2022
Cited by 24 | Viewed by 8406
Abstract
Twitter has become a major social media platform and has attracted considerable interest among researchers in sentiment analysis. Research into Twitter Sentiment Analysis (TSA) is an active subfield of text mining. TSA refers to the use of computers to process the subjective nature [...] Read more.
Twitter has become a major social media platform and has attracted considerable interest among researchers in sentiment analysis. Research into Twitter Sentiment Analysis (TSA) is an active subfield of text mining. TSA refers to the use of computers to process the subjective nature of Twitter data, including its opinions and sentiments. In this research, a thorough review of the most recent developments in this area, and a wide range of newly proposed algorithms and applications are explored. Each publication is arranged into a category based on its significance to a particular type of TSA method. The purpose of this survey is to provide a concise, nearly comprehensive overview of TSA techniques and related fields. The primary contributions of the survey are the detailed classifications of numerous recent articles and the depiction of the current direction of research in the field of TSA. Full article
(This article belongs to the Special Issue Advances and Application of Intelligent Video Surveillance System)
Show Figures

Figure 1

Back to TopTop