sensors-logo

Journal Browser

Journal Browser

Sensors for Object Detection, Classification and Tracking

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Intelligent Sensors".

Deadline for manuscript submissions: closed (28 February 2022) | Viewed by 54845

Special Issue Editor

Special Issue Information

Dear Colleagues,

In recent years, there has been a rapid and successful expansion of computer vision research in several application fields. One area that has attained great progress is object detection, the intended use of which is to determine the location and the class of all (or specific) object instances in an image, and temporal tracking of their position.

Algorithms for object detection are strictly dependent on acquisition devices (RGB cameras, thermal, infrared, multi/hyper-spectral). On the other hand, deep neural networks (DNNs) have recently emerged as a powerful machine-learning model able to learn powerful object representations/models without the need to manually design features.

The goal of this Special Issue of Sensors is to give a perspective on object detection research. It will be dedicated to highlighting both theoretical and practical aspects of object detection; deep learning-based approaches are welcomed, as well as approaches based on unconventional input sensors, such as hyperspectral or thermal images.

This Special Issue fits within the Scope of Sensors because it explores the topic of object detection from two different points of view. On one hand, new methodologies will be investigated, for example, deep learning-based approaches. On the other hand, the accent will also be on acquisition devices, and approaches able to extract information from multi- and hyperspectral images will be welcomed.

Dr. Paolo Spagnolo
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Object detection
  • Object tracking
  • Supervised and unsupervised object classification
  • Deep learning algorithms
  • Thermal images analysis
  • Algorithms for hyperspectral image analysis
  • Multispectral object analysis

Published Papers (12 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

19 pages, 8056 KiB  
Article
First Gradually, Then Suddenly: Understanding the Impact of Image Compression on Object Detection Using Deep Learning
by Tomasz Gandor and Jakub Nalepa
Sensors 2022, 22(3), 1104; https://0-doi-org.brum.beds.ac.uk/10.3390/s22031104 - 01 Feb 2022
Cited by 9 | Viewed by 3278
Abstract
Video surveillance systems process high volumes of image data. To enable long-term retention of recorded images and because of the data transfer limitations in geographically distributed systems, lossy compression is commonly applied to images prior to processing, but this causes a deterioration in [...] Read more.
Video surveillance systems process high volumes of image data. To enable long-term retention of recorded images and because of the data transfer limitations in geographically distributed systems, lossy compression is commonly applied to images prior to processing, but this causes a deterioration in image quality due to the removal of potentially important image details. In this paper, we investigate the impact of image compression on the performance of object detection methods based on convolutional neural networks. We focus on Joint Photographic Expert Group (JPEG) compression and thoroughly analyze a range of the performance metrics. Our experimental study, performed over a widely used object detection benchmark, assessed the robustness of nine popular object-detection deep models against varying compression characteristics. We show that our methodology can allow practitioners to establish an acceptable compression level for specific use cases; hence, it can play a key role in applications that process and store very large image data. Full article
(This article belongs to the Special Issue Sensors for Object Detection, Classification and Tracking)
Show Figures

Figure 1

21 pages, 5726 KiB  
Article
Efficient Online Object Tracking Scheme for Challenging Scenarios
by Khizer Mehmood, Ahmad Ali, Abdul Jalil, Baber Khan, Khalid Mehmood Cheema, Maria Murad and Ahmad H. Milyani
Sensors 2021, 21(24), 8481; https://0-doi-org.brum.beds.ac.uk/10.3390/s21248481 - 20 Dec 2021
Cited by 9 | Viewed by 2916
Abstract
Visual object tracking (VOT) is a vital part of various domains of computer vision applications such as surveillance, unmanned aerial vehicles (UAV), and medical diagnostics. In recent years, substantial improvement has been made to solve various challenges of VOT techniques such as change [...] Read more.
Visual object tracking (VOT) is a vital part of various domains of computer vision applications such as surveillance, unmanned aerial vehicles (UAV), and medical diagnostics. In recent years, substantial improvement has been made to solve various challenges of VOT techniques such as change of scale, occlusions, motion blur, and illumination variations. This paper proposes a tracking algorithm in a spatiotemporal context (STC) framework. To overcome the limitations of STC based on scale variation, a max-pooling-based scale scheme is incorporated by maximizing over posterior probability. To avert target model from drift, an efficient mechanism is proposed for occlusion handling. Occlusion is detected from average peak to correlation energy (APCE)-based mechanism of response map between consecutive frames. On successful occlusion detection, a fractional-gain Kalman filter is incorporated for handling the occlusion. An additional extension to the model includes APCE criteria to adapt the target model in motion blur and other factors. Extensive evaluation indicates that the proposed algorithm achieves significant results against various tracking methods. Full article
(This article belongs to the Special Issue Sensors for Object Detection, Classification and Tracking)
Show Figures

Figure 1

20 pages, 17899 KiB  
Article
Real-Time 3D Object Detection and SLAM Fusion in a Low-Cost LiDAR Test Vehicle Setup
by Duarte Fernandes, Tiago Afonso, Pedro Girão, Dibet Gonzalez, António Silva, Rafael Névoa, Paulo Novais, João Monteiro and Pedro Melo-Pinto
Sensors 2021, 21(24), 8381; https://0-doi-org.brum.beds.ac.uk/10.3390/s21248381 - 15 Dec 2021
Cited by 9 | Viewed by 3907
Abstract
Recently released research about deep learning applications related to perception for autonomous driving focuses heavily on the usage of LiDAR point cloud data as input for the neural networks, highlighting the importance of LiDAR technology in the field of Autonomous Driving (AD). In [...] Read more.
Recently released research about deep learning applications related to perception for autonomous driving focuses heavily on the usage of LiDAR point cloud data as input for the neural networks, highlighting the importance of LiDAR technology in the field of Autonomous Driving (AD). In this sense, a great percentage of the vehicle platforms used to create the datasets released for the development of these neural networks, as well as some AD commercial solutions available on the market, heavily invest in an array of sensors, including a large number of sensors as well as several sensor modalities. However, these costs create a barrier to entry for low-cost solutions for the performance of critical perception tasks such as Object Detection and SLAM. This paper explores current vehicle platforms and proposes a low-cost, LiDAR-based test vehicle platform capable of running critical perception tasks (Object Detection and SLAM) in real time. Additionally, we propose the creation of a deep learning-based inference model for Object Detection deployed in a resource-constrained device, as well as a graph-based SLAM implementation, providing important considerations, explored while taking into account the real-time processing requirement and presenting relevant results demonstrating the usability of the developed work in the context of the proposed low-cost platform. Full article
(This article belongs to the Special Issue Sensors for Object Detection, Classification and Tracking)
Show Figures

Figure 1

20 pages, 6612 KiB  
Article
Visual Attention and Color Cues for 6D Pose Estimation on Occluded Scenarios Using RGB-D Data
by Joel Vidal, Chyi-Yeu Lin and Robert Martí
Sensors 2021, 21(23), 8090; https://0-doi-org.brum.beds.ac.uk/10.3390/s21238090 - 03 Dec 2021
Cited by 1 | Viewed by 1797
Abstract
Recently, 6D pose estimation methods have shown robust performance on highly cluttered scenes and different illumination conditions. However, occlusions are still challenging, with recognition rates decreasing to less than 10% for half-visible objects in some datasets. In this paper, we propose to use [...] Read more.
Recently, 6D pose estimation methods have shown robust performance on highly cluttered scenes and different illumination conditions. However, occlusions are still challenging, with recognition rates decreasing to less than 10% for half-visible objects in some datasets. In this paper, we propose to use top-down visual attention and color cues to boost performance of a state-of-the-art method on occluded scenarios. More specifically, color information is employed to detect potential points in the scene, improve feature-matching, and compute more precise fitting scores. The proposed method is evaluated on the Linemod occluded (LM-O), TUD light (TUD-L), Tejani (IC-MI) and Doumanoglou (IC-BIN) datasets, as part of the SiSo BOP benchmark, which includes challenging highly occluded cases, illumination changing scenarios, and multiple instances. The method is analyzed and discussed for different parameters, color spaces and metrics. The presented results show the validity of the proposed approach and their robustness against illumination changes and multiple instance scenarios, specially boosting the performance on relatively high occluded cases. The proposed solution provides an absolute improvement of up to 30% for levels of occlusion between 40% to 50%, outperforming other approaches with a best overall recall of 71% for the LM-O, 92% for TUD-L, 99.3% for IC-MI and 97.5% for IC-BIN. Full article
(This article belongs to the Special Issue Sensors for Object Detection, Classification and Tracking)
Show Figures

Figure 1

18 pages, 23639 KiB  
Article
Efficient Single-Shot Multi-Object Tracking for Vehicles in Traffic Scenarios
by Youngkeun Lee, Sang-ha Lee, Jisang Yoo and Soonchul Kwon
Sensors 2021, 21(19), 6358; https://0-doi-org.brum.beds.ac.uk/10.3390/s21196358 - 23 Sep 2021
Cited by 6 | Viewed by 2836
Abstract
Multi-object tracking is a significant field in computer vision since it provides essential information for video surveillance and analysis. Several different deep learning-based approaches have been developed to improve the performance of multi-object tracking by applying the most accurate and efficient combinations of [...] Read more.
Multi-object tracking is a significant field in computer vision since it provides essential information for video surveillance and analysis. Several different deep learning-based approaches have been developed to improve the performance of multi-object tracking by applying the most accurate and efficient combinations of object detection models and appearance embedding extraction models. However, two-stage methods show a low inference speed since the embedding extraction can only be performed at the end of the object detection. To alleviate this problem, single-shot methods, which simultaneously perform object detection and embedding extraction, have been developed and have drastically improved the inference speed. However, there is a trade-off between accuracy and efficiency. Therefore, this study proposes an enhanced single-shot multi-object tracking system that displays improved accuracy while maintaining a high inference speed. With a strong feature extraction and fusion, the object detection of our model achieves an AP score of 69.93% on the UA-DETRAC dataset and outperforms previous state-of-the-art methods, such as FairMOT and JDE. Based on the improved object detection performance, our multi-object tracking system achieves a MOTA score of 68.5% and a PR-MOTA score of 24.5% on the same dataset, also surpassing the previous state-of-the-art trackers. Full article
(This article belongs to the Special Issue Sensors for Object Detection, Classification and Tracking)
Show Figures

Figure 1

12 pages, 553 KiB  
Article
Delay-Tolerant Distributed Inference in Tracking Networks
by Mohammadreza Alimadadi, Milica Stojanovic and Pau Closas
Sensors 2021, 21(17), 5747; https://0-doi-org.brum.beds.ac.uk/10.3390/s21175747 - 26 Aug 2021
Viewed by 1387
Abstract
This paper discusses asynchronous distributed inference in object tracking. Unlike many studies, which assume that the delay in communication between partial estimators and the central station is negligible, our study focuses on the problem of asynchronous distributed inference in the presence of delays. [...] Read more.
This paper discusses asynchronous distributed inference in object tracking. Unlike many studies, which assume that the delay in communication between partial estimators and the central station is negligible, our study focuses on the problem of asynchronous distributed inference in the presence of delays. We introduce an efficient data fusion method for combining the distributed estimates, where delay in communications is not negligible. To overcome the delay, predictions are made for the state of the system based on the most current available information from partial estimators. Simulation results show the efficacy of the methods proposed. Full article
(This article belongs to the Special Issue Sensors for Object Detection, Classification and Tracking)
Show Figures

Figure 1

31 pages, 7809 KiB  
Article
Designing a Simple Fiducial Marker for Localization in Spatial Scenes Using Neural Networks
by Milan Košťák and Antonín Slabý
Sensors 2021, 21(16), 5407; https://0-doi-org.brum.beds.ac.uk/10.3390/s21165407 - 10 Aug 2021
Cited by 4 | Viewed by 10148
Abstract
The paper describes the process of designing a simple fiducial marker. The marker is meant for use in augmented reality applications. Unlike other systems, it does not encode any information, but it can be used for obtaining the position, rotation, relative size, and [...] Read more.
The paper describes the process of designing a simple fiducial marker. The marker is meant for use in augmented reality applications. Unlike other systems, it does not encode any information, but it can be used for obtaining the position, rotation, relative size, and projective transformation. Also, the system works well with motion blur and is resistant to the marker’s imperfections, which could theoretically be drawn only by hand. Previous systems put constraints on colors that need to be used to form the marker. The proposed system works with any saturated color, leading to better blending with the surrounding environment. The marker’s final shape is a rectangular area of a solid color with three lines of a different color going from the center to three corners of the rectangle. Precise detection can be achieved using neural networks, given that the training set is very varied and well designed. A detailed literature review was performed, and no such system was found. Therefore, the proposed design is novel for localization in the spatial scene. The testing proved that the system works well both indoor and outdoor, and the detections are precise. Full article
(This article belongs to the Special Issue Sensors for Object Detection, Classification and Tracking)
Show Figures

Figure 1

21 pages, 572 KiB  
Article
Confidence Score: The Forgotten Dimension of Object Detection Performance Evaluation
by Simon Wenkel, Khaled Alhazmi, Tanel Liiv, Saud Alrshoud and Martin Simon
Sensors 2021, 21(13), 4350; https://0-doi-org.brum.beds.ac.uk/10.3390/s21134350 - 25 Jun 2021
Cited by 33 | Viewed by 8073
Abstract
When deploying a model for object detection, a confidence score threshold is chosen to filter out false positives and ensure that a predicted bounding box has a certain minimum score. To achieve state-of-the-art performance on benchmark datasets, most neural networks use a rather [...] Read more.
When deploying a model for object detection, a confidence score threshold is chosen to filter out false positives and ensure that a predicted bounding box has a certain minimum score. To achieve state-of-the-art performance on benchmark datasets, most neural networks use a rather low threshold as a high number of false positives is not penalized by standard evaluation metrics. However, in scenarios of Artificial Intelligence (AI) applications that require high confidence scores (e.g., due to legal requirements or consequences of incorrect detections are severe) or a certain level of model robustness is required, it is unclear which base model to use since they were mainly optimized for benchmark scores. In this paper, we propose a method to find the optimum performance point of a model as a basis for fairer comparison and deeper insights into the trade-offs caused by selecting a confidence score threshold. Full article
(This article belongs to the Special Issue Sensors for Object Detection, Classification and Tracking)
Show Figures

Figure 1

16 pages, 4700 KiB  
Article
A System Using Artificial Intelligence to Detect and Scare Bird Flocks in the Protection of Ripening Fruit
by Petr Marcoň, Jiří Janoušek, Josef Pokorný, Josef Novotný, Eliška Vlachová Hutová, Anna Širůčková, Martin Čáp, Jana Lázničková, Radim Kadlec, Petr Raichl, Přemysl Dohnal, Miloslav Steinbauer and Eva Gescheidtová
Sensors 2021, 21(12), 4244; https://0-doi-org.brum.beds.ac.uk/10.3390/s21124244 - 21 Jun 2021
Cited by 9 | Viewed by 7040
Abstract
Flocks of birds may cause major damage to fruit crops in the ripening phase. This problem is addressed by various methods for bird scaring; in many cases, however, the birds become accustomed to the distraction, and the applied scaring procedure loses its purpose. [...] Read more.
Flocks of birds may cause major damage to fruit crops in the ripening phase. This problem is addressed by various methods for bird scaring; in many cases, however, the birds become accustomed to the distraction, and the applied scaring procedure loses its purpose. To help eliminate the difficulty, we present a system to detect flocks and to trigger an actuator that will scare the objects only when a flock passes through the monitored space. The actual detection is performed with artificial intelligence utilizing a convolutional neural network. Before teaching the network, we employed videocameras and a differential algorithm to detect all items moving in the vineyard. Such objects revealed in the images were labeled and then used in training, testing, and validating the network. The assessment of the detection algorithm required evaluating the parameters precision, recall, and F1 score. In terms of function, the algorithm is implemented in a module consisting of a microcomputer and a connected videocamera. When a flock is detected, the microcontroller will generate a signal to be wirelessly transmitted to the module, whose task is to trigger the scaring actuator. Full article
(This article belongs to the Special Issue Sensors for Object Detection, Classification and Tracking)
Show Figures

Figure 1

20 pages, 5330 KiB  
Article
Semi-Supervised Anomaly Detection in Video-Surveillance Scenes in the Wild
by Mohammad Ibrahim Sarker, Cristina Losada-Gutiérrez, Marta Marrón-Romera, David Fuentes-Jiménez and Sara Luengo-Sánchez
Sensors 2021, 21(12), 3993; https://0-doi-org.brum.beds.ac.uk/10.3390/s21123993 - 09 Jun 2021
Cited by 14 | Viewed by 3673
Abstract
Surveillance cameras are being installed in many primary daily living places to maintain public safety. In this video-surveillance context, anomalies occur only for a very short time, and very occasionally. Hence, manual monitoring of such anomalies may be exhaustive and monotonous, resulting in [...] Read more.
Surveillance cameras are being installed in many primary daily living places to maintain public safety. In this video-surveillance context, anomalies occur only for a very short time, and very occasionally. Hence, manual monitoring of such anomalies may be exhaustive and monotonous, resulting in a decrease in reliability and speed in emergency situations due to monitor tiredness. Within this framework, the importance of automatic detection of anomalies is clear, and, therefore, an important amount of research works have been made lately in this topic. According to these earlier studies, supervised approaches perform better than unsupervised ones. However, supervised approaches demand manual annotation, making dependent the system reliability of the different situations used in the training (something difficult to set in anomaly context). In this work, it is proposed an approach for anomaly detection in video-surveillance scenes based on a weakly supervised learning algorithm. Spatio-temporal features are extracted from each surveillance video using a temporal convolutional 3D neural network (T-C3D). Then, a novel ranking loss function increases the distance between the classification scores of anomalous and normal videos, reducing the number of false negatives. The proposal has been evaluated and compared against state-of-art approaches, obtaining competitive performance without fine-tuning, which also validates its generalization capability. In this paper, the proposal design and reliability is presented and analyzed, as well as the aforementioned quantitative and qualitative evaluation in-the-wild scenarios, demonstrating its high sensitivity in anomaly detection in all of them. Full article
(This article belongs to the Special Issue Sensors for Object Detection, Classification and Tracking)
Show Figures

Figure 1

19 pages, 21373 KiB  
Article
Detection of Specific Building in Remote Sensing Images Using a Novel YOLO-S-CIOU Model. Case: Gas Station Identification
by Jinfeng Gao, Yu Chen, Yongming Wei and Jiannan Li
Sensors 2021, 21(4), 1375; https://0-doi-org.brum.beds.ac.uk/10.3390/s21041375 - 16 Feb 2021
Cited by 30 | Viewed by 4152
Abstract
The specific building is of great significance in smart city planning, management practices, or even military use. However, traditional classification or target identification methods are difficult to distinguish different type of buildings from remote sensing images, because the characteristics of the environmental landscape [...] Read more.
The specific building is of great significance in smart city planning, management practices, or even military use. However, traditional classification or target identification methods are difficult to distinguish different type of buildings from remote sensing images, because the characteristics of the environmental landscape around the buildings (like the pixels of the road and parking area) are complex, and it is difficult to define them with simple rules. Convolution neural networks (CNNs) have a strong capacity to mine information from the spatial context and have been used in many tasks of image processing. Here, we developed a novel CNN model named YOLO-S-CIOU, which was improved based on YOLOv3 for specific building detection in two aspects: (1) module Darknet53 in YOLOv3 was replaced with SRXnet (constructed by superimposing multiple SE-ResNeXt) to significantly improve the feature learning ability of YOLO-S-CIOU while maintaining the similar complexity as YOLOv3; (2) Complete-IoU Loss (CIoU Loss) was used to obtain a better regression for the bounding box. We took the gas station as an example. The experimental results on the self-made gas station dataset (GS dataset) showed YOLO-S-CIOU achieved an average precision (AP) of 97.62%, an F1 score of 97.50%, and had 59,065,366 parameters. Compared with YOLOv3, YOLO-S-CIOU reduced the parameters’ number by 2,510,977 (about 4%) and improved the AP by 2.23% and the F1 score by 0.5%. Moreover, in gas stations detection in Tumshuk City and Yanti City, the recall (R) and precision (P) of YOLO-S-CIOU were 50% and 40% higher than those of YOLOv3, respectively. It showed that our proposed network had stronger robustness and higher detection ability in remote sensing image detection of different regions. Full article
(This article belongs to the Special Issue Sensors for Object Detection, Classification and Tracking)
Show Figures

Graphical abstract

18 pages, 2084 KiB  
Article
Dynamic Indoor Localization Using Maximum Likelihood Particle Filtering
by Wenxu Wang, Damián Marelli and Minyue Fu
Sensors 2021, 21(4), 1090; https://0-doi-org.brum.beds.ac.uk/10.3390/s21041090 - 05 Feb 2021
Cited by 15 | Viewed by 2086
Abstract
A popular approach for solving the indoor dynamic localization problem based on WiFi measurements consists of using particle filtering. However, a drawback of this approach is that a very large number of particles are needed to achieve accurate results in real environments. The [...] Read more.
A popular approach for solving the indoor dynamic localization problem based on WiFi measurements consists of using particle filtering. However, a drawback of this approach is that a very large number of particles are needed to achieve accurate results in real environments. The reason for this drawback is that, in this particular application, classical particle filtering wastes many unnecessary particles. To remedy this, we propose a novel particle filtering method which we call maximum likelihood particle filter (MLPF). The essential idea consists of combining the particle prediction and update steps into a single one in which all particles are efficiently used. This drastically reduces the number of particles, leading to numerically feasible algorithms with high accuracy. We provide experimental results, using real data, confirming our claim. Full article
(This article belongs to the Special Issue Sensors for Object Detection, Classification and Tracking)
Show Figures

Figure 1

Back to TopTop