remotesensing-logo

Journal Browser

Journal Browser

Artificial Intelligence-Driven Methods for Remote Sensing Target and Object Detection

A special issue of Remote Sensing (ISSN 2072-4292). This special issue belongs to the section "AI Remote Sensing".

Deadline for manuscript submissions: closed (30 June 2023) | Viewed by 50465

Special Issue Editors

Institute of Geophysics and Geomatics, China University of Geosciences, Wuhan 430074, China
Interests: hyperspectral target detection; dimensionality reduction; scene classification; metric learning; transfer learning; multi-source remote sensing data geological interpretation
Special Issues, Collections and Topics in MDPI journals
School of Mathematics and Statistics, University of Glasgow, Glasgow, UK
Interests: distance metric learning; few-shot learning; hyperspectral image analysis; statistical classification
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Remote sensing image includes a rich description of the earth’s surface in various modalities (hyperspectral data, high resolution data, multispectral data, synthetic aperture radar (SAR) data, etc.). Remote sensing target detection or object detection is to determine whether there are targets or objects of interest in the image, playing a decisive role in resource detection, environmental monitoring, urban planning, national security, agriculture, forestry, climate, hydrologand, etc. In recent years, artificial intelligence (AI) has achieved considerable development and been successfully applied for various applications, such as regression, clustering, classification, etc. Although AI-driven approaches can handle massive quantity of data acquired by remote sensors, they require many high-quality labeled samples to deal with remote sensing big data, leading to fragile results. That is, AI-driven approaches with strong ability of feature extraction have limited performance and are still far from practical demands. Thus, target detection or object detection in the presence of complicated background with limited labeled samples remains a challenging mission. There is still much room for research on remote sensing target detection and object detection. The main goal of this special issue is to address advanced topics related to remote sensing target detection and object detection.Topics of interests include but are not limited to the following:

  • New AI-driven methods for remote sensing data, such as GNN, transformer, etc.;
  • New remote sensing datasets, including hyperspectral, high resolution, SAR datasets, etc.;
  • Machine learning techniques for remote sensing applications, such as domain adaptation, few-shot learning, manifold learning, metric learning;
  • Machine learning-based drone detection and fine-grained detection;
  • Target detection, object detection, and anomaly detection;
  • Data-driven applications in remote sensing;
  • Technique reviews on related topics.

Dr. Yanni Dong
Dr. Xiaochen Yang
Prof. Dr. Qian Du
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Remote Sensing is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2700 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • remote sensing
  • target detection
  • artificial intelligence
  • machine learning
  • deep learning
  • object detection
  • new datasets

Related Special Issue

Published Papers (29 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Other

21 pages, 5577 KiB  
Article
AFRE-Net: Adaptive Feature Representation Enhancement for Arbitrary Oriented Object Detection
by Tianwei Zhang, Xu Sun, Lina Zhuang, Xiaoyu Dong, Jianjun Sha, Bing Zhang and Ke Zheng
Remote Sens. 2023, 15(20), 4965; https://0-doi-org.brum.beds.ac.uk/10.3390/rs15204965 - 14 Oct 2023
Cited by 1 | Viewed by 912
Abstract
Arbitrary-oriented object detection (AOOD) is a crucial task in aerial image analysis but is also faced with significant challenges. In current AOOD detectors, commonly used multi-scale feature fusion modules fall short in spatial and semantic information complement between scales. Additionally, fixed feature extraction [...] Read more.
Arbitrary-oriented object detection (AOOD) is a crucial task in aerial image analysis but is also faced with significant challenges. In current AOOD detectors, commonly used multi-scale feature fusion modules fall short in spatial and semantic information complement between scales. Additionally, fixed feature extraction structures are usually used following a fusion model, resulting in the inability of detectors to self-adjust. At the same time, feature fusion and extraction modules are designed in isolation and the internal synergy between them is ignored. The above problems result in feature representation deficiency, thus affecting the overall detection precision. To solve these problems, we first create a fine-grained feature pyramid network (FG-FPN) that not only provides richer spatial and semantic features, but also completes neighbor scale features in a self-learning mode. Subsequently, we propose a novel feature enhancement module (FEM) to fit FG-FPN. FEM authorizes the detection unit to automatically adjust the sensing area and adaptively suppress background interference, thereby generating stronger feature representations. Our proposed solution was tested through extensive experiments on challenging datasets, including DOTA (77.44% mAP), HRSC2016 (97.82% mAP), UCAS-AOD (91.34% mAP), as well as ICDAR2015 (86.27% F-score) and its effectiveness and high applicability are verified on all the above datasets. Full article
Show Figures

Figure 1

18 pages, 14678 KiB  
Article
Mask R-CNN–Based Landslide Hazard Identification for 22.6 Extreme Rainfall Induced Landslides in the Beijiang River Basin, China
by Zhibo Wu, Hao Li, Shaoxiong Yuan, Qinghua Gong, Jun Wang and Bing Zhang
Remote Sens. 2023, 15(20), 4898; https://0-doi-org.brum.beds.ac.uk/10.3390/rs15204898 - 10 Oct 2023
Cited by 1 | Viewed by 868
Abstract
Landslides triggered by extreme precipitation events pose a significant threat to human life and property in mountainous regions. Therefore, accurate identification of landslide locations is crucial for effective prevention and mitigation strategies. During the prolonged heavy rainfall events in Guangdong Province between 21 [...] Read more.
Landslides triggered by extreme precipitation events pose a significant threat to human life and property in mountainous regions. Therefore, accurate identification of landslide locations is crucial for effective prevention and mitigation strategies. During the prolonged heavy rainfall events in Guangdong Province between 21 May and 21 June 2022, shallow and clustered landslides occurred in the mountainous regions of the Beijiang River Basin. This research used high-resolution satellite imagery and integrated the Mask R-CNN algorithm model with spectral, textural, morphological and physical characteristics of landslides in remote sensing imagery, in addition to landslide-influencing factors and other constraints, to interpret the landslides induced by the event through remote sensing techniques. The detection results show that the proposed methodology achieved a high level of accuracy in landslide identification, with a precision rate of 81.91%, a recall rate of 84.07% and an overall accuracy of 87.28%. A total of 3782 shallow landslides were detected, showing a distinct clustered distribution pattern. The performance of Mask R-CNN, Faster-CNN, U-Net and YOLOv3 models in landslide identification was further compared, and the effects of setting the rotation angle and constraints on the identification results of the Mask R-CNN algorithm model were investigated. The results show that each model improves the evaluation indices, but the Mask R-CNN model has the best detection performance; the rotation angle can effectively improve the generalization ability and robustness of the model, and the landslide-inducing factor data and texture feature sample data are the best for landslide identification. The research results provide valuable references and technical support for deepening our understanding of the distribution patterns of rainfall-triggered shallow and cluster landslides in the Beijiang River Basin. Full article
Show Figures

Graphical abstract

21 pages, 2222 KiB  
Article
Long-Tailed Object Detection for Multimodal Remote Sensing Images
by Jiaxin Yang, Miaomiao Yu, Shuohao Li, Jun Zhang and Shengze Hu
Remote Sens. 2023, 15(18), 4539; https://0-doi-org.brum.beds.ac.uk/10.3390/rs15184539 - 15 Sep 2023
Viewed by 956
Abstract
With the rapid development of remote sensing technology, the application of convolutional neural networks in remote sensing object detection has become very widespread, and some multimodal feature fusion networks have also been proposed in recent years. However, these methods generally do not consider [...] Read more.
With the rapid development of remote sensing technology, the application of convolutional neural networks in remote sensing object detection has become very widespread, and some multimodal feature fusion networks have also been proposed in recent years. However, these methods generally do not consider the long-tailed problem that is widely present in remote sensing images, which limits the further improvement of model detection performance. To solve this problem, we propose a novel long-tailed object detection method for multimodal remote sensing images, which can effectively fuse the complementary information of visible light and infrared images and adapt to the imbalance between positive and negative samples of different categories. Firstly, the dynamic feature fusion module (DFF) based on image entropy can dynamically adjust the fusion coefficient according to the information content of different source images, retaining more key feature information for subsequent object detection. Secondly, the instance-balanced mosaic (IBM) data augmentation method balances instance sampling during data augmentation, providing more sample features for the model and alleviating the negative impact of data distribution imbalance. Finally, class-balanced BCE loss (CBB) can not only consider the learning difficulty of specific instances but also balances the learning difficulty between categories, thereby improving the model’s detection accuracy for tail instances. Experimental results on three public benchmark datasets show that our proposed method achieves state-of-the-art performance; in particular, the optimization of the long-tailed problem enables the model to meet various application scenarios of remote sensing image detection. Full article
Show Figures

Graphical abstract

23 pages, 9286 KiB  
Article
Infrared Small Target Detection Based on a Temporally-Aware Fully Convolutional Neural Network
by Lei Zhang, Peng Han, Jiahua Xi and Zhengrong Zuo
Remote Sens. 2023, 15(17), 4198; https://0-doi-org.brum.beds.ac.uk/10.3390/rs15174198 - 26 Aug 2023
Cited by 1 | Viewed by 1034
Abstract
In the field of computer vision, the detection of infrared small targets (IRSTD) is a crucial research area that plays an important role in space exploration, infrared warning systems, and other applications. However, the existing IRSTD methods are prone to generating a higher [...] Read more.
In the field of computer vision, the detection of infrared small targets (IRSTD) is a crucial research area that plays an important role in space exploration, infrared warning systems, and other applications. However, the existing IRSTD methods are prone to generating a higher number of false alarms and an inability to accurately locate the target, especially in scenarios with low signal-to-noise ratio or high noise interference. To address this issue, we proposes a fully convolutional-based small target detection algorithm (FCST). The algorithm builds on the anchor-free detection method FCOS and adds a focus structure and a single aggregation approach to design a lightweight feature extraction network that efficiently extracts features for small targets. Furthermore, we propose a feature refinement mechanism to emphasize the target and suppress conflicting information at multiple scales, enhancing the detection of infrared small targets. Experimental results demonstrate that the proposed algorithm achieves a detection rate of 95% and a false alarm rate of 2.32% for IRSTD tasks. To tackle even more complex scenarios, we propose a temporally-aware fully convolutional infrared small target detection (TFCST) algorithm that leverages both spatial and temporal information from sequence images. Building on a single-frame detection network, the algorithm incorporates ConvLSTM units to extract spatiotemporal contextual information from the sequence images, boosting the detection of infrared small targets. The proposed algorithm shows fast detection speed and achieves a 2.73% improvement in detection rate and an 8.13% reduction in false alarm rate relative to the baseline single-frame detection networks. Full article
Show Figures

Figure 1

24 pages, 696 KiB  
Article
Radar Active Jamming Recognition under Open World Setting
by Yupei Zhang, Zhijin Zhao and Yi Bu
Remote Sens. 2023, 15(16), 4107; https://0-doi-org.brum.beds.ac.uk/10.3390/rs15164107 - 21 Aug 2023
Viewed by 835
Abstract
To address the issue that conventional methods cannot recognize unknown patterns of radar jamming, this study adopts the idea of zero-shot learning (ZSL) and proposes an open world recognition method, RCAE-OWR, based on residual convolutional autoencoders, which can implement the classification of known [...] Read more.
To address the issue that conventional methods cannot recognize unknown patterns of radar jamming, this study adopts the idea of zero-shot learning (ZSL) and proposes an open world recognition method, RCAE-OWR, based on residual convolutional autoencoders, which can implement the classification of known and unknown patterns. In the supervised training phase, a residual convolutional autoencoder network structure is first constructed to extract the semantic information from a training set consisting solely of known jamming patterns. By incorporating center loss and reconstruction loss into the softmax loss function, a joint loss function is constructed to minimize the intra-class distance and maximize the inter-class distance in the jamming features. Moving to the unsupervised classification phase, a test set containing both known and unknown patterns is fed into the trained encoder, and a distance-based recognition method is utilized to classify the jamming signals. The results demonstrate that the proposed model not only achieves sufficient learning and representation of known jamming patterns but also effectively identifies and classifies unknown jamming signals. When the jamming-to-noise ratio (JNR) exceeds 10 dB, the recognition rate for seven known jamming patterns and two unknown jamming patterns is more than 92%. Full article
Show Figures

Figure 1

23 pages, 26356 KiB  
Article
National-Standards- and Deep-Learning-Oriented Raster and Vector Benchmark Dataset (RVBD) for Land-Use/Land-Cover Mapping in the Yangtze River Basin
by Pengfei Zhang, Yijin Wu, Chang Li, Renhua Li, He Yao, Yong Zhang, Genlin Zhang and Dehua Li
Remote Sens. 2023, 15(15), 3907; https://0-doi-org.brum.beds.ac.uk/10.3390/rs15153907 - 07 Aug 2023
Cited by 1 | Viewed by 969
Abstract
A high-quality remote sensing interpretation dataset has become crucial for driving an intelligent model, i.e., deep learning (DL), to produce land-use/land-cover (LULC) products. The existing remote sensing datasets face the following issues: the current studies (1) lack object-oriented fine-grained information; (2) they cannot [...] Read more.
A high-quality remote sensing interpretation dataset has become crucial for driving an intelligent model, i.e., deep learning (DL), to produce land-use/land-cover (LULC) products. The existing remote sensing datasets face the following issues: the current studies (1) lack object-oriented fine-grained information; (2) they cannot meet national standards; (3) they lack field surveys for labeling samples; and (4) they cannot serve for geographic engineering application directly. To address these gaps, the national-standards- and DL-oriented raster and vector benchmark dataset (RVBD) is the first to be established to map LULC for conducting soil water erosion assessment (SWEA). RVBD has the following significant innovation and contributions: (1) it is the first second-level object- and DL-oriented dataset with raster and vector data for LULC mapping; (2) its classification system conforms to the national industry standards of the Ministry of Water Resources of the People’s Republic of China; (3) it has high-quality LULC interpretation accuracy assisted by field surveys rather than indoor visual interpretation; and (4) it could be applied to serve for SWEA. Our dataset is constructed as follows: (1) spatio-temporal-spectrum information is utilized to perform automatic vectorization and label LULC attributes conforming to the national standards; and (2) several remarkable DL networks (DenseNet161, HorNet, EfficientNetB7, Vision Transformer, and Swin Transformer) are chosen as the baselines to train our dataset, and five evaluation metrics are chosen to perform quantitative evaluation. Experimental results verify the reliability and effectiveness of RVBD. Each chosen network achieves a minimum overall accuracy of 0.81 and a minimum Kappa of 0.80, and Vision Transformer achieves the best classification performance with overall accuracy of 0.87 and Kappa of 0.86. It indicates that RVBD is a significant benchmark, which could lay a foundation for intelligent interpretation of relevant geographic research about SWEA in the Yangtze River Basin and promote artificial intelligence technology to enrich geographical theories and methods. Full article
Show Figures

Figure 1

21 pages, 5441 KiB  
Article
Discarding–Recovering and Co-Evolution Mechanisms Based Evolutionary Algorithm for Hyperspectral Feature Selection
by Bowen Liao, Yangxincan Li, Wei Liu, Xianjun Gao and Mingwei Wang
Remote Sens. 2023, 15(15), 3788; https://0-doi-org.brum.beds.ac.uk/10.3390/rs15153788 - 30 Jul 2023
Viewed by 700
Abstract
With the improvement of spectral resolution, the redundant information in the hyperspectral imaging (HSI) datasets brings computational, analytical, and storage complexities. Feature selection is a combinatorial optimization problem, which selects a subset of feasible features to reduce the dimensionality of data and decrease [...] Read more.
With the improvement of spectral resolution, the redundant information in the hyperspectral imaging (HSI) datasets brings computational, analytical, and storage complexities. Feature selection is a combinatorial optimization problem, which selects a subset of feasible features to reduce the dimensionality of data and decrease the noise information. In recent years, the evolutionary algorithm (EA) has been widely used in feature selection, but the diversity of agents is lacking in the population, which leads to premature convergence. In this paper, a feature selection method based on discarding–recovering and co-evolution mechanisms is proposed with the aim of obtaining an effective feature combination in HSI datasets. The feature discarding mechanism is introduced to remove redundant information by roughly filtering the feature space. To further enhance the agents’ diversity, the reliable information interaction is also designed into the co-evolution mechanism, and if detects the event of stagnation, a subset of discarded features will be recovered using adaptive weights. Experimental results demonstrate that the proposed method performs well on three public datasets, achieving an overall accuracy of 92.07%, 92.36%, and 98.01%, respectively, and obtaining the number of selected features between 15% and 25% of the total. Full article
Show Figures

Figure 1

22 pages, 5969 KiB  
Article
UCDnet: Double U-Shaped Segmentation Network Cascade Centroid Map Prediction for Infrared Weak Small Target Detection
by Xiangdong Xu, Jiarong Wang, Ming Zhu, Haijiang Sun, Zhenyuan Wu, Yao Wang, Shenyi Cao and Sanzai Liu
Remote Sens. 2023, 15(15), 3736; https://0-doi-org.brum.beds.ac.uk/10.3390/rs15153736 - 27 Jul 2023
Viewed by 866
Abstract
In recent years, the development of deep learning has brought great convenience to the work of target detection, semantic segmentation, and object recognition. In the field of infrared weak small target detection (e.g., surveillance and reconnaissance), it is not only necessary to accurately [...] Read more.
In recent years, the development of deep learning has brought great convenience to the work of target detection, semantic segmentation, and object recognition. In the field of infrared weak small target detection (e.g., surveillance and reconnaissance), it is not only necessary to accurately detect targets but also to perform precise segmentation and sub-pixel-level centroid localization for infrared small targets with low signal-to-noise ratio and weak texture information. To address these issues, we propose UCDnet (Double U-shaped Segmentation Network Cascade Centroid Map Prediction for Infrared Weak Small Target Detection) in this paper, which completes “end-to-end” training and prediction by cascading the centroid localization subnet with the semantic segmentation subnet. We propose the novel double U-shaped feature extraction network for point target fine segmentation. We propose the concept and method of centroid map prediction for point target localization and design the corresponding Com loss function, together with a new centroid localization evaluation metrics. The experiments show that ours achieves target detection, semantic segmentation, and sub-pixel-level centroid localization. When the target signal-to-noise ratio is greater than 0.4, the IoU of our semantic segmentation results can reach 0.9186, and the average centroid localization precision can reach 0.3371 pixels. On our simulated dataset of infrared weak small targets, the algorithm we proposed performs better than existing state-of-the-art networks in terms of semantic segmentation and centroid localization. Full article
Show Figures

Graphical abstract

23 pages, 2451 KiB  
Article
Unmanned Aerial Vehicle Perspective Small Target Recognition Algorithm Based on Improved YOLOv5
by He Xu, Wenlong Zheng, Fengxuan Liu, Peng Li and Ruchuan Wang
Remote Sens. 2023, 15(14), 3583; https://0-doi-org.brum.beds.ac.uk/10.3390/rs15143583 - 17 Jul 2023
Cited by 4 | Viewed by 1985
Abstract
Small target detection has been widely used in applications that are relevant to everyday life and have many real-time requirements, such as road patrols and security surveillance. Although object detection methods based on deep learning have achieved great success in recent years, they [...] Read more.
Small target detection has been widely used in applications that are relevant to everyday life and have many real-time requirements, such as road patrols and security surveillance. Although object detection methods based on deep learning have achieved great success in recent years, they are not effective in small target detection. In order to solve the problem of low recognition rate caused by factors such as low resolution of UAV viewpoint images and little valid information, this paper proposes an improved algorithm based on the YOLOv5s model, called YOLOv5s-pp. First, to better suppress interference from complex backgrounds and negative samples in images, we add a CA attention module, which can better focus on task-specific important channels while weakening the influence of irrelevant channels. Secondly, we improve the forward propagation and generalisation of the network using the Meta-ACON activation function, which adaptively learns to adjust the degree of linearity or nonlinearity of the activation function based on the input data. Again, the SPD Conv module is incorporated into the network model to address the problems of reduced learning efficiency and loss of fine-grained information due to cross-layer convolution in the model. Finally, the detection head is improved by using smaller, smaller-target detection heads to reduce missed detections. We evaluated the algorithm on the VisDrone2019-DET and UAVDT datasets and compared it with other state-of-the-art algorithms. Compared to YOLOv5s, [email protected] improved by 7.4% and 6.5% on the VisDrone2019-DET and UAVDT datasets, respectively, and compared to YOLOv8s, [email protected] improved by 0.8% and 2.1%, respectively. For improving the performance of the UAV-side small target detection algorithm, it will help to enhance the reliability and safety of UAVs in critical missions such as military reconnaissance, road patrol and security surveillance. Full article
Show Figures

Figure 1

20 pages, 5265 KiB  
Article
Improving YOLOv7-Tiny for Infrared and Visible Light Image Object Detection on Drones
by Shuming Hu, Fei Zhao, Huanzhang Lu, Yingjie Deng, Jinming Du and Xinglin Shen
Remote Sens. 2023, 15(13), 3214; https://0-doi-org.brum.beds.ac.uk/10.3390/rs15133214 - 21 Jun 2023
Cited by 4 | Viewed by 2934
Abstract
To address the phenomenon of many small and hard-to-detect objects in drone images, this study proposes an improved algorithm based on the YOLOv7-tiny model. The proposed algorithm assigns anchor boxes according to the aspect ratio of ground truth boxes to provide prior information [...] Read more.
To address the phenomenon of many small and hard-to-detect objects in drone images, this study proposes an improved algorithm based on the YOLOv7-tiny model. The proposed algorithm assigns anchor boxes according to the aspect ratio of ground truth boxes to provide prior information on object shape for the network and uses a hard sample mining loss function (HSM Loss) to guide the network to enhance learning from hard samples. This study finds that the aspect ratio difference of vehicle objects under drone perspective is more obvious than the scale difference, so the anchor boxes assigned by aspect ratio can provide more effective prior information for the network than those assigned by size. This study evaluates the algorithm on a drone image dataset (DroneVehicle) and compares it with other state-of-the-art algorithms. The experimental results show that the proposed algorithm achieves superior average precision values on both infrared and visible light images, while maintaining a light weight. Full article
Show Figures

Figure 1

21 pages, 6471 KiB  
Article
SIVED: A SAR Image Dataset for Vehicle Detection Based on Rotatable Bounding Box
by Xin Lin, Bo Zhang, Fan Wu, Chao Wang, Yali Yang and Huiqin Chen
Remote Sens. 2023, 15(11), 2825; https://0-doi-org.brum.beds.ac.uk/10.3390/rs15112825 - 29 May 2023
Viewed by 1985
Abstract
The research and development of deep learning methods are heavily reliant on large datasets, and there is currently a lack of scene-rich datasets for synthetic aperture radar (SAR) image vehicle detection. To address this issue and promote the development of SAR vehicle detection [...] Read more.
The research and development of deep learning methods are heavily reliant on large datasets, and there is currently a lack of scene-rich datasets for synthetic aperture radar (SAR) image vehicle detection. To address this issue and promote the development of SAR vehicle detection algorithms, we constructed the SAR Image dataset for VEhicle Detection (SIVED) using Ka, Ku, and X bands of data. Rotatable bounding box annotations were employed to improve positioning accuracy, and an algorithm for automatic annotation was proposed to improve efficiency. The dataset exhibits three crucial properties: richness, stability, and challenge. It comprises 1044 chips and 12,013 vehicle instances, most of which are situated in complex backgrounds. To construct a baseline, eight detection algorithms are evaluated on SIVED. The experimental results show that all detectors achieved high mean average precision (mAP) on the test set, highlighting the dataset’s stability. However, there is still room for improvement in the accuracy with respect to the complexity of the background. In summary, SIVED fills the gap in SAR image vehicle detection datasets and demonstrates good adaptability for the development of deep learning algorithms. Full article
Show Figures

Figure 1

20 pages, 6118 KiB  
Article
Cross-Viewpoint Template Matching Based on Heterogeneous Feature Alignment and Pixel-Wise Consensus for Air- and Space-Based Platforms
by Tian Hui, Yuelei Xu, Qing Zhou, Chaofeng Yuan and Jarhinbek Rasol
Remote Sens. 2023, 15(9), 2426; https://0-doi-org.brum.beds.ac.uk/10.3390/rs15092426 - 05 May 2023
Viewed by 1363
Abstract
Template matching is the fundamental task in remote sensing image processing of air- and space-based platforms. Due to the heterogeneous image sources, different scales and different viewpoints, the realization of a general end-to-end matching model is still a challenging task. Considering the abovementioned [...] Read more.
Template matching is the fundamental task in remote sensing image processing of air- and space-based platforms. Due to the heterogeneous image sources, different scales and different viewpoints, the realization of a general end-to-end matching model is still a challenging task. Considering the abovementioned problems, we propose a cross-view remote sensing image matching method. Firstly, a spatial attention map was proposed to solve the problem of the domain gap. It is produced by two-dimensional Gaussian distribution and eliminates the distance between the distributed heterogeneous features. Secondly, in order to perform matching at different flight altitudes, a multi-scale matching method was proposed to perform matching on three down-sampling scales in turn and confirm the optimal result. Thirdly, to improve the adaptability of the viewpoint changes, a pixel-wise consensus method based on a correlation layer was applied. Finally, we trained the proposed model based on weakly supervised learning, which does not require extensive annotation but only labels one pair of feature points of the template image and search image. The robustness and effectiveness of the proposed methods were demonstrated by evaluation on various datasets. Our method accommodates three types of template matching with different viewpoints, including SAR to RGB, infrared to RGB, and RGB to RGB. Full article
Show Figures

Graphical abstract

23 pages, 35922 KiB  
Article
MUREN: MUltistage Recursive Enhanced Network for Coal-Fired Power Plant Detection
by Shuai Yuan, Juepeng Zheng, Lixian Zhang, Runmin Dong, Ray C. C. Cheung and Haohuan Fu
Remote Sens. 2023, 15(8), 2200; https://0-doi-org.brum.beds.ac.uk/10.3390/rs15082200 - 21 Apr 2023
Cited by 1 | Viewed by 993
Abstract
The accurate detection of coal-fired power plants (CFPPs) is meaningful for environmental protection, while challenging. The CFPP is a complex combination of multiple components with varying layouts, unlike clearly defined single objects, such as vehicles. CFPPs are typically located in industrial districts with [...] Read more.
The accurate detection of coal-fired power plants (CFPPs) is meaningful for environmental protection, while challenging. The CFPP is a complex combination of multiple components with varying layouts, unlike clearly defined single objects, such as vehicles. CFPPs are typically located in industrial districts with similar backgrounds, further complicating the detection task. To address this issue, we propose a MUltistage Recursive Enhanced Detection Network (MUREN) for accurate and efficient CFPP detection. The effectiveness of MUREN lies in the following: First, we design a symmetrically enhanced module, including a spatial-enhanced subnetwork (SEN) and a channel-enhanced subnetwork (CEN). SEN learns the spatial relationships to obtain spatial context information. CEN provides adaptive channel recalibration, restraining noise disturbance and highlighting CFPP features. Second, we use a recursive construction set on top of feature pyramid networks to receive features more than once, strengthening feature learning for relatively small CFPPs. We conduct comparative and ablation experiments in two datasets and apply MUREN to the Pearl River Delta region in Guangdong province for CFPP detection. The comparative experiment results show that MUREN improves the mAP by 5.98% compared with the baseline method and outperforms by 4.57–21.38% the existing cutting-edge detection methods, which indicates the promising potential of MUREN in large-scale CFPP detection scenarios. Full article
Show Figures

Figure 1

14 pages, 9545 KiB  
Article
An Effective Task Sampling Strategy Based on Category Generation for Fine-Grained Few-Shot Object Recognition
by Shifan Liu, Ailong Ma, Shaoming Pan and Yanfei Zhong
Remote Sens. 2023, 15(6), 1552; https://0-doi-org.brum.beds.ac.uk/10.3390/rs15061552 - 12 Mar 2023
Cited by 2 | Viewed by 1246
Abstract
The recognition of fine-grained objects is crucial for future remote sensing applications, but this task is faced with the few-shot problem due to limited labeled data. In addition, the existing few-shot learning methods do not consider the unique characteristics of remote sensing objects, [...] Read more.
The recognition of fine-grained objects is crucial for future remote sensing applications, but this task is faced with the few-shot problem due to limited labeled data. In addition, the existing few-shot learning methods do not consider the unique characteristics of remote sensing objects, i.e., the complex backgrounds and the difficulty of extracting fine-grained features, leading to suboptimal performance. In this study, we developed an improved task sampling strategy for few-shot learning that optimizes the target distribution. The proposed approach incorporates broad category information, where each sample is assigned both a broad and fine category label and converts the target task distribution into a fine-grained distribution. This ensures that the model focuses on extracting fine-grained features for the corresponding broad category. We also introduce a category generation method that ensures the same number of fine-grained categories in each task to improve the model accuracy. The experimental results demonstrate that the proposed strategy outperforms the existing object recognition methods. We believe that this strategy has the potential to be applied to fine-grained few-shot object recognition, thus contributing to the development of high-precision remote sensing applications. Full article
Show Figures

Figure 1

17 pages, 23081 KiB  
Article
Semantic Segmentation of Mesoscale Eddies in the Arabian Sea: A Deep Learning Approach
by Mohamad Abed El Rahman Hammoud, Peng Zhan, Omar Hakla, Omar Knio and Ibrahim Hoteit
Remote Sens. 2023, 15(6), 1525; https://0-doi-org.brum.beds.ac.uk/10.3390/rs15061525 - 10 Mar 2023
Viewed by 1243
Abstract
Detecting mesoscale ocean eddies provides a better understanding of the oceanic processes that govern the transport of salt, heat, and carbon. Established eddy detection techniques rely on physical or geometric criteria, and they notoriously fail to predict eddies that are neither circular nor [...] Read more.
Detecting mesoscale ocean eddies provides a better understanding of the oceanic processes that govern the transport of salt, heat, and carbon. Established eddy detection techniques rely on physical or geometric criteria, and they notoriously fail to predict eddies that are neither circular nor elliptical in shape. Recently, deep learning techniques have been applied for semantic segmentation of mesoscale eddies, relying on the outputs of traditional eddy detection algorithms to supervise the training of the neural network. However, this approach limits the network’s predictions because the available annotations are either circular or elliptical. Moreover, current approaches depend on the sea-surface height, temperature, or currents as inputs to the network, and these data may not provide all the information necessary to accurately segment eddies. In the present work, we have trained a neural network for the semantic segmentation of eddies using human-based—and expert-validated—annotations of eddies in the Arabian Sea. Training with human-annotated datasets enables the network predictions to include more complex geometries, which occur commonly in the real ocean. We then examine the impact of different combinations of input surface variables on the segmentation performance of the network. The results indicate that providing additional surface variables as inputs to the network improves the accuracy of the predictions by approximately 5%. We have further fine-tuned another pre-trained neural network to segment eddies and achieved a reduced overall training time and higher accuracy compared to the results from a network trained from scratch. Full article
Show Figures

Figure 1

17 pages, 2466 KiB  
Article
Contrastive Domain Adaptation-Based Sparse SAR Target Classification under Few-Shot Cases
by Hui Bi, Zehao Liu, Jiarui Deng, Zhongyuan Ji and Jingjing Zhang
Remote Sens. 2023, 15(2), 469; https://0-doi-org.brum.beds.ac.uk/10.3390/rs15020469 - 13 Jan 2023
Cited by 3 | Viewed by 2264
Abstract
Due to the imaging mechanism of synthetic aperture radar (SAR), it is difficult and costly to acquire abundant labeled SAR images. Moreover, a typical matched filtering (MF) based image faces the problems of serious noise, sidelobes, and clutters, which will bring down the [...] Read more.
Due to the imaging mechanism of synthetic aperture radar (SAR), it is difficult and costly to acquire abundant labeled SAR images. Moreover, a typical matched filtering (MF) based image faces the problems of serious noise, sidelobes, and clutters, which will bring down the accuracy of SAR target classification. Different from the MF-based result, a sparse image shows better quality with less noise and higher image signal-to-noise ratio (SNR). Therefore, theoretically using it for target classification will achieve better performance. In this paper, a novel contrastive domain adaptation (CDA) based sparse SAR target classification method is proposed to solve the problem of insufficient samples. In the proposed method, we firstly construct a sparse SAR image dataset by using the complex image based iterative soft thresholding (BiIST) algorithm. Then, the simulated and real SAR datasets are simultaneously sent into an unsupervised domain adaptation framework to reduce the distribution difference and obtain the reconstructed simulated SAR images for subsequent target classification. Finally, the reconstructed simulated images are manually labeled and fed into a shallow convolutional neural network (CNN) for target classification along with a small number of real sparse SAR images. Since the current definition of the number of small samples is still vague and inconsistent, this paper defines few-shot as less than 20 per class. Experimental results based on MSTAR under standard operating conditions (SOC) and extended operating conditions (EOC) show that the reconstructed simulated SAR dataset makes up for the insufficient information from limited real data. Compared with other typical deep learning methods based on limited samples, our method is able to achieve higher accuracy especially under the conditions of few shots. Full article
Show Figures

Graphical abstract

27 pages, 6203 KiB  
Article
Improved Neural Network with Spatial Pyramid Pooling and Online Datasets Preprocessing for Underwater Target Detection Based on Side Scan Sonar Imagery
by Jinrui Li, Libin Chen, Jian Shen, Xiongwu Xiao, Xiaosong Liu, Xin Sun, Xiao Wang and Deren Li
Remote Sens. 2023, 15(2), 440; https://0-doi-org.brum.beds.ac.uk/10.3390/rs15020440 - 11 Jan 2023
Cited by 14 | Viewed by 3612
Abstract
Fast and high-accuracy detection of underwater targets based on side scan sonar images has great potential for marine fisheries, underwater security, marine mapping, underwater engineering and other applications. The following problems, however, must be addressed when using low-resolution side scan sonar images for [...] Read more.
Fast and high-accuracy detection of underwater targets based on side scan sonar images has great potential for marine fisheries, underwater security, marine mapping, underwater engineering and other applications. The following problems, however, must be addressed when using low-resolution side scan sonar images for underwater target detection: (1) the detection performance is limited due to the restriction on the input of multi-scale images; (2) the widely used deep learning algorithms have a low detection effect due to their complex convolution layer structures; (3) the detection performance is limited due to insufficient model complexity in training process; and (4) the number of samples is not enough because of the bad dataset preprocessing methods. To solve these problems, an improved neural network for underwater target detection—which is based on side scan sonar images and fully utilizes spatial pyramid pooling and online dataset preprocessing based on the You Look Only Once version three (YOLO V3) algorithm—is proposed. The methodology of the proposed approach is as follows: (1) the AlexNet, GoogleNet, VGGNet and the ResNet networks and an adopted YOLO V3 algorithm were the backbone networks. The structure of the YOLO V3 model is more mature and compact and has higher target detection accuracy and better detection efficiency than the other models; (2) spatial pyramid pooling was added at the end of the convolution layer to improve detection performance. Spatial pyramid pooling breaks the scale restrictions when inputting images to improve feature extraction because spatial pyramid pooling enables the backbone network to learn faster at high accuracy; and (3) online dataset preprocessing based on YOLO V3 with spatial pyramid pooling increases the number of samples and improves the complexity of the model to further improve detection process performance. Three-side scan imagery datasets were used for training and were tested in experiments. The quantitative evaluation using Accuracy, Recall, Precision, mAP and F1-Score metrics indicates that: for the AlexNet, GoogleNet, VGGNet and ResNet algorithms, when spatial pyramid pooling is added to their backbone networks, the average detection accuracy of the three sets of data was improved by 2%, 4%, 2% and 2%, respectively, as compared to their original formulations. Compared with the original YOLO V3 model, the proposed ODP+YOLO V3+SPP underwater target detection algorithm model has improved detection performance through the mAP qualitative evaluation index has increased by 6%, the Precision qualitative evaluation index has increased by 13%, and the detection efficiency has increased by 9.34%. These demonstrate that adding spatial pyramid pooling and online dataset preprocessing can improve the target detection accuracy of these commonly used algorithms. The proposed, improved neural network with spatial pyramid pooling and online dataset preprocessing based on the YOLO V3 method achieves the highest scores for underwater target detection results for sunken ships, fish flocks and seafloor topography, with mAP scores of 98%, 91% and 96% for the above three kinds of datasets, respectively. Full article
Show Figures

Figure 1

23 pages, 13718 KiB  
Article
A Spatial Cross-Scale Attention Network and Global Average Accuracy Loss for SAR Ship Detection
by Lili Zhang, Yuxuan Liu, Lele Qu, Jiannan Cai and Junpeng Fang
Remote Sens. 2023, 15(2), 350; https://0-doi-org.brum.beds.ac.uk/10.3390/rs15020350 - 06 Jan 2023
Cited by 3 | Viewed by 1702
Abstract
A neural network-based object detection algorithm has the advantages of high accuracy and end-to-end processing, and it has been widely used in synthetic aperture radar (SAR) ship detection. However, the multi-scale variation of ship targets, the complex background of near-shore scenes, and the [...] Read more.
A neural network-based object detection algorithm has the advantages of high accuracy and end-to-end processing, and it has been widely used in synthetic aperture radar (SAR) ship detection. However, the multi-scale variation of ship targets, the complex background of near-shore scenes, and the dense arrangement of some ships make it difficult to improve detection accuracy. To solve the above problem, in this paper, a spatial cross-scale attention network (SCSA-Net) for SAR image ship detection is proposed, which includes a novel spatial cross-scale attention (SCSA) module for eliminating the interference of land background. The SCSA module uses the features at each scale output from the backbone to calculate where the network needs attention in space and enhances the features of the feature pyramid network (FPN) output to eliminate interference from noise, and land complex backgrounds. In addition, this paper analyzes the reasons for the “score shift” problem caused by average precision loss (AP loss) and proposes the global average precision loss (GAP loss) to solve the “score shift” problem. GAP loss enables the network to distinguish positive samples and negative samples faster than focal loss and AP loss, and achieve higher accuracy. Finally, we validate and illustrate the effectiveness of the proposed method by performing it on SAR Ship Detection Dataset (SSDD), SAR-ship-dataset, and High-Resolution SAR Images Dataset (HRSID). The experimental results show that the proposed method can significantly reduce the interference of background noise on the ship detection results, improve the detection accuracy, and achieve superior results to the existing methods. Full article
Show Figures

Figure 1

19 pages, 16723 KiB  
Article
Object Counting in Remote Sensing via Triple Attention and Scale-Aware Network
by Xiangyu Guo, Marco Anisetti, Mingliang Gao and Gwanggil Jeon
Remote Sens. 2022, 14(24), 6363; https://0-doi-org.brum.beds.ac.uk/10.3390/rs14246363 - 15 Dec 2022
Cited by 8 | Viewed by 1525
Abstract
Object counting is a fundamental task in remote sensing analysis. Nevertheless, it has been barely studied compared with object counting in natural images due to the challenging factors, e.g., background clutter and scale variation. This paper proposes a triple attention and scale-aware network [...] Read more.
Object counting is a fundamental task in remote sensing analysis. Nevertheless, it has been barely studied compared with object counting in natural images due to the challenging factors, e.g., background clutter and scale variation. This paper proposes a triple attention and scale-aware network (TASNet). Specifically, a triple view attention (TVA) module is adopted to remedy the background clutter, which executes three-dimension attention operations on the input tensor. In this case, it can capture the interaction dependencies between three dimensions to distinguish the object region. Meanwhile, a pyramid feature aggregation (PFA) module is employed to relieve the scale variation. The PFA module is built in a four-branch architecture, and each branch has a similar structure composed of dilated convolution layers to enlarge the receptive field. Furthermore, a scale transmit connection is introduced to enable the lower branch to acquire the upper branch’s scale, increasing the output’s scale diversity. Experimental results on remote sensing datasets prove that the proposed model can address the issues of background clutter and scale variation. Moreover, it outperforms the state-of-the-art (SOTA) competitors subjectively and objectively. Full article
Show Figures

Figure 1

19 pages, 34619 KiB  
Article
OrtDet: An Orientation Robust Detector via Transformer for Object Detection in Aerial Images
by Ling Zhao, Tianhua Liu, Shuchun Xie, Haoze Huang and Ji Qi
Remote Sens. 2022, 14(24), 6329; https://0-doi-org.brum.beds.ac.uk/10.3390/rs14246329 - 14 Dec 2022
Cited by 2 | Viewed by 1576
Abstract
The detection of arbitrarily rotated objects in aerial images is challenging due to the highly complex backgrounds and the multiple angles of objects. Existing detectors are not robust relative to the varying angle of objects because the CNNs do not explicitly model the [...] Read more.
The detection of arbitrarily rotated objects in aerial images is challenging due to the highly complex backgrounds and the multiple angles of objects. Existing detectors are not robust relative to the varying angle of objects because the CNNs do not explicitly model the orientation’s variation. In this paper, we propose an Orientation Robust Detector (OrtDet) to solve this problem, which aims to learn features that change accordingly with the object’s rotation (i.e., rotation-equivariant features). Specifically, we introduce a vision transformer as the backbone to capture its remote contextual associations via the degree of feature similarities. By capturing the features of each part of the object and their relative spatial distribution, OrtDet can learn features that have a complete response to any direction of the object. In addition, we use the tokens concatenation layer (TCL) strategy, which generates a pyramidal feature hierarchy for addressing vastly different scales of objects. To avoid the confusion of angle regression, we predict the relative gliding offsets of the vertices in each corresponding side of the horizontal bounding boxes (HBBs) to represent the oriented bounding boxes (OBBs). To intuitively reflect the robustness of the detector, a new metric, the mean rotation precision (mRP), is proposed to quantitatively measure the model’s learning ability for a rotation-equivariant feature. Experiments on the DOTA-v1.0, DOTA-v1.5, and HRSC2016 datasets show that our method improves the mAP by 0.5, 1.1, and 2.2 and reduces mRP detection fluctuations by 0.74, 0.56, and 0.52, respectively. Full article
Show Figures

Figure 1

25 pages, 10182 KiB  
Article
MQANet: Multi-Task Quadruple Attention Network of Multi-Object Semantic Segmentation from Remote Sensing Images
by Yuxia Li, Yu Si, Zhonggui Tong, Lei He, Jinglin Zhang, Shiyu Luo and Yushu Gong
Remote Sens. 2022, 14(24), 6256; https://0-doi-org.brum.beds.ac.uk/10.3390/rs14246256 - 10 Dec 2022
Cited by 7 | Viewed by 1446
Abstract
Multi-object semantic segmentation from remote sensing images has gained significant attention in land resource surveying, global change monitoring, and disaster detection. Compared to other application scenarios, the objects in the remote sensing field are larger and have a wider range of distribution. In [...] Read more.
Multi-object semantic segmentation from remote sensing images has gained significant attention in land resource surveying, global change monitoring, and disaster detection. Compared to other application scenarios, the objects in the remote sensing field are larger and have a wider range of distribution. In addition, some similar targets, such as roads and concrete-roofed buildings, are easily misjudged. However, existing convolutional neural networks operate only in the local receptive field, and this limits their capacity to represent the potential association between different objects and surrounding features. This paper develops a Multi-task Quadruple Attention Network (MQANet) to address the above-mentioned issues and increase segmentation accuracy. The MQANet contains four attention modules: position attention module (PAM), channel attention module (CAM), label attention module (LAM), and edge attention module (EAM). The quadruple attention modules obtain global features by expanding the receptive fields of the network and introducing spatial context information in the label. Then, a multi-tasking mechanism which splits a multi-category segmentation task into several binary-classification segmentation tasks is introduced to improve the ability to identify similar objects. The proposed MQANet network was applied to the Potsdam dataset, the Vaihingen dataset and self-annotated images from Chongzhou and Wuzhen (CZ-WZ), representative cities in China. Our MQANet performs better over the baseline net by a large margin of +6.33 OA and +7.05 Mean F1-score on the Vaihingen dataset, +3.57 OA and +2.83 Mean F1-score on the Potsdam dataset, and +3.88 OA and +8.65 Mean F1-score on the self-annotated dataset (CZ-WZ dataset). In addition, each image execution time of the MQANet model is reduced 66.6 ms compared to UNet. Moreover, the effectiveness of MQANet was also proven by comparative experiments with other studies. Full article
Show Figures

Figure 1

23 pages, 13371 KiB  
Article
A Defect Detection Method Based on BC-YOLO for Transmission Line Components in UAV Remote Sensing Images
by Wenxia Bao, Xiang Du, Nian Wang, Mu Yuan and Xianjun Yang
Remote Sens. 2022, 14(20), 5176; https://0-doi-org.brum.beds.ac.uk/10.3390/rs14205176 - 16 Oct 2022
Cited by 28 | Viewed by 2622
Abstract
Vibration dampers and insulators are important components of transmission lines, and it is therefore important for the normal operation of transmission lines to detect defects in these components in a timely manner. In this paper, we provide an automatic detection method for component [...] Read more.
Vibration dampers and insulators are important components of transmission lines, and it is therefore important for the normal operation of transmission lines to detect defects in these components in a timely manner. In this paper, we provide an automatic detection method for component defects through patrolling inspection by an unmanned aerial vehicle (UAV). We constructed a dataset of vibration dampers and insulators (DVDI) on transmission lines in images obtained by the UAV. It is difficult to detect defects in vibration dampers and insulators from UAV images, as these components and their defective parts are very small parts of the images, and the components vary greatly in terms of their shape and color and are easily confused with the background. In view of this, we use the end-to-end coordinate attention and bidirectional feature pyramid network “you only look once” (BC-YOLO) to detect component defects. To make the network focus on the features of vibration dampers and insulators rather than the complex backgrounds, we added the coordinate attention (CA) module to YOLOv5. CA encodes each channel separately along the vertical and horizontal directions, which allows the attention module to simultaneously capture remote spatial interactions with precise location information and helps the network locate targets of interest more accurately. In the multiscale feature fusion stage, different input features have different resolutions, and their contributions to the fused output features are usually unequal. However, PANet treats each input feature equally and simply sums them up without distinction. In this paper, we replace the original PANet feature fusion framework in YOLOv5 with a bidirectional feature pyramid network (BiFPN). BiFPN introduces learnable weights to learn the importance of different features, which can make the network focus more on the feature mapping that contributes more to the output features. To verify the effectiveness of our method, we conducted a test in DVDI, and its [email protected] reached 89.1%, a value 2.7% higher than for YOLOv5. Full article
Show Figures

Figure 1

19 pages, 16785 KiB  
Article
MSCNet: A Multilevel Stacked Context Network for Oriented Object Detection in Optical Remote Sensing Images
by Rui Zhang, Xinxin Zhang, Yuchao Zheng, Dahan Wang and Lizhong Hua
Remote Sens. 2022, 14(20), 5066; https://0-doi-org.brum.beds.ac.uk/10.3390/rs14205066 - 11 Oct 2022
Cited by 4 | Viewed by 1696
Abstract
Oriented object detection has recently become a hot research topic in remote sensing because it provides a better spatial expression of oriented target objects. Although research has made considerable progress in this field, the feature of multiscale and arbitrary directions still poses great [...] Read more.
Oriented object detection has recently become a hot research topic in remote sensing because it provides a better spatial expression of oriented target objects. Although research has made considerable progress in this field, the feature of multiscale and arbitrary directions still poses great challenges for oriented object detection tasks. In this paper, a multilevel stacked context network (MSCNet) is proposed to enhance target detection accuracy by aggregating the semantic relationships between different objects and contexts in remote sensing images. Additionally, to alleviate the impact of the defects of the traditional oriented bounding box representation, the feasibility of using a Gaussian distribution instead of the traditional representation is discussed in this paper. Finally, we verified the performance of our work on two common remote sensing datasets, and the results show that our proposed network improved on the baseline. Full article
Show Figures

Figure 1

21 pages, 51353 KiB  
Article
Oriented Ship Detection Based on Intersecting Circle and Deformable RoI in Remote Sensing Images
by Jun Zhang, Ruofei Huang, Yan Li and Bin Pan
Remote Sens. 2022, 14(19), 4749; https://0-doi-org.brum.beds.ac.uk/10.3390/rs14194749 - 22 Sep 2022
Cited by 3 | Viewed by 1479
Abstract
Ship detection is an important topic in the task of understanding remote sensing images. One of the challenges for ship detection is the large length–width ratio of ships, which may weaken the feature extraction ability. Simultaneously, ships inclining in any direction is also [...] Read more.
Ship detection is an important topic in the task of understanding remote sensing images. One of the challenges for ship detection is the large length–width ratio of ships, which may weaken the feature extraction ability. Simultaneously, ships inclining in any direction is also a challenge for ship detection in remote sensing images. In this paper, a novel Oriented Ship detection method is proposed based on an intersecting Circle and Deformable region of interest (OSCD-Net), which aims at describing the characteristics of a large length–width ratio and arbitrary direction. OSCD-Net is composed of two modules: an intersecting circle rotated detection head (ICR-head) and a deformable region of interest (DRoI). The ICR-head detects a horizontal bounding box and an intersecting circle to obtain an oriented bounding box. DRoI performs three RoIAlign with different pooled sizes for each feature candidate region. In addition, the DRoI module uses transformation and deformation operations to pay attention to ship feature information and align feature shapes. OSCD-Net shows promising performance on public remote sensing image datasets. Full article
Show Figures

Graphical abstract

21 pages, 11361 KiB  
Article
A Multiscale and Multitask Deep Learning Framework for Automatic Building Extraction
by Jichong Yin, Fang Wu, Yue Qiu, Anping Li, Chengyi Liu and Xianyong Gong
Remote Sens. 2022, 14(19), 4744; https://0-doi-org.brum.beds.ac.uk/10.3390/rs14194744 - 22 Sep 2022
Cited by 9 | Viewed by 1752
Abstract
Detecting buildings, segmenting building footprints, and extracting building edges from high-resolution remote sensing images are vital in applications such as urban planning, change detection, smart cities, and map-making and updating. The tasks of building detection, footprint segmentation, and edge extraction affect each other [...] Read more.
Detecting buildings, segmenting building footprints, and extracting building edges from high-resolution remote sensing images are vital in applications such as urban planning, change detection, smart cities, and map-making and updating. The tasks of building detection, footprint segmentation, and edge extraction affect each other to a certain extent. However, most previous works have focused on one of these three tasks and have lacked a multitask learning framework that can simultaneously solve the tasks of building detection, footprint segmentation and edge extraction, making it difficult to obtain smooth and complete buildings. This study proposes a novel multiscale and multitask deep learning framework to consider the dependencies among building detection, footprint segmentation, and edge extraction while completing all three tasks. In addition, a multitask feature fusion module is introduced into the deep learning framework to increase the robustness of feature extraction. A multitask loss function is also introduced to balance the training losses among the various tasks to obtain the best training results. Finally, the proposed method is applied to open-source building datasets and large-scale high-resolution remote sensing images and compared with other advanced building extraction methods. To verify the effectiveness of multitask learning, the performance of multitask learning and single-task training is compared in ablation experiments. The experimental results show that the proposed method has certain advantages over other methods and that multitask learning can effectively improve single-task performance. Full article
Show Figures

Graphical abstract

19 pages, 2228 KiB  
Article
Spatial–Spectral Cross-Correlation Embedded Dual-Transfer Network for Object Tracking Using Hyperspectral Videos
by Jie Lei, Pan Liu, Weiying Xie, Long Gao, Yunsong Li and Qian Du
Remote Sens. 2022, 14(15), 3512; https://0-doi-org.brum.beds.ac.uk/10.3390/rs14153512 - 22 Jul 2022
Cited by 11 | Viewed by 1441
Abstract
Hyperspectral (HS) videos can describe objects at the material level due to their rich spectral bands, which are more conducive to object tracking compared with color videos. However, the existing HS object trackers cannot make good use of deep-learning models to mine their [...] Read more.
Hyperspectral (HS) videos can describe objects at the material level due to their rich spectral bands, which are more conducive to object tracking compared with color videos. However, the existing HS object trackers cannot make good use of deep-learning models to mine their semantic information due to limited annotation data samples. Moreover, the high-dimensional characteristics of HS videos makes the training of a deep-learning model challenging. To address the above problems, this paper proposes a spatial–spectral cross-correlation embedded dual-transfer network (SSDT-Net). Specifically, first, we propose to use transfer learning to transfer the knowledge of traditional color videos to the HS tracking task and develop a dual-transfer strategy to gauge the similarity between the source and target domain. In addition, a spectral weighted fusion method is introduced to obtain the inputs of the Siamese network, and we propose a spatial–spectral cross-correlation module to better embed the spatial and material information between the two branches of the Siamese network for classification and regression. The experimental results demonstrate that, compared to the state of the art, the proposed SSDT-Net tracker offers more satisfactory performance based on a similar speed to the traditional color trackers. Full article
Show Figures

Graphical abstract

19 pages, 9695 KiB  
Article
A Context Feature Enhancement Network for Building Extraction from High-Resolution Remote Sensing Imagery
by Jinzhi Chen, Dejun Zhang, Yiqi Wu, Yilin Chen and Xiaohu Yan
Remote Sens. 2022, 14(9), 2276; https://0-doi-org.brum.beds.ac.uk/10.3390/rs14092276 - 09 May 2022
Cited by 18 | Viewed by 4305
Abstract
The complexity and diversity of buildings make it challenging to extract low-level and high-level features with strong feature representation by using deep neural networks in building extraction tasks. Meanwhile, deep neural network-based methods have many network parameters, which take up a lot of [...] Read more.
The complexity and diversity of buildings make it challenging to extract low-level and high-level features with strong feature representation by using deep neural networks in building extraction tasks. Meanwhile, deep neural network-based methods have many network parameters, which take up a lot of memory and time in training and testing. We propose a novel fully convolutional neural network called the Context Feature Enhancement Network (CFENet) to address these issues. CFENet comprises three modules: the spatial fusion module, the focus enhancement module, and the feature decoder module. First, the spatial fusion module aggregates the spatial information of low-level features to obtain buildings’ outline and edge information. Secondly, the focus enhancement module fully aggregates the semantic information of high-level features to filter the information of building-related attribute categories. Finally, the feature decoder module decodes the output of the above two modules to segment the buildings more accurately. In a series of experiments on the WHU Building Dataset and the Massachusetts Building Dataset, our CFENet balances efficiency and accuracy compared to the other four methods we compared, and achieves optimality on all five evaluation metrics: PA, PC, F1, IoU, and FWIoU. This indicates that CFENet can effectively enhance and fuse buildings’ low-level and high-level features, improving building extraction accuracy. Full article
Show Figures

Figure 1

24 pages, 11125 KiB  
Article
A Study on the Dynamic Effects and Ecological Stress of Eco-Environment in the Headwaters of the Yangtze River Based on Improved DeepLab V3+ Network
by Chunsheng Wang, Rui Zhang and Lili Chang
Remote Sens. 2022, 14(9), 2225; https://0-doi-org.brum.beds.ac.uk/10.3390/rs14092225 - 06 May 2022
Cited by 7 | Viewed by 1661
Abstract
The headwaters of the Yangtze River are a complicated system composed of different eco-environment elements. The abnormal moisture and energy exchanges between the atmosphere and earth systems caused by global climate change are predicted to produce drastic changes in these eco-environment elements. In [...] Read more.
The headwaters of the Yangtze River are a complicated system composed of different eco-environment elements. The abnormal moisture and energy exchanges between the atmosphere and earth systems caused by global climate change are predicted to produce drastic changes in these eco-environment elements. In order to study the dynamic effect and ecological stress in the eco-environment, we adapted the Double Attention Mechanism (DAM) to improve the performance of the DeepLab V3+ network in large-scale semantic segmentation. We proposed Elements Fragmentation (EF) and Elements Information Content (EIC) to quantitatively analyze the spatial distribution characteristics and spatial relationships of eco-environment elements. In this paper, the following conclusions were drawn: (1) we established sample sets based on “Sentinel-2” remote sensing images using the interpretation signs of eco-environment elements; (2) the mAP, mIoU, and Kappa of the improved DeepLab V3+ method were 0.639, 0.778, and 0.825, respectively, which demonstrates a good ability to distinguish the eco-environment elements; (3) between 2015 and 2021, EF gradually increased from 0.2234 to 0.2394, and EIC increased from 23.80 to 25.32, which shows that the eco-environment is oriented to complex, heterogeneous, and discontinuous processes; (4) the headwaters of the Yangtze River are a community of life, and thus we should build a multifunctional ecological management system with which to implement well-organized and efficient scientific ecological rehabilitation projects. Full article
Show Figures

Graphical abstract

Other

Jump to: Research

15 pages, 1138 KiB  
Technical Note
Multi-Prior Twin Least-Square Network for Anomaly Detection of Hyperspectral Imagery
by Jiaping Zhong, Yunsong Li, Weiying Xie, Jie Lei and Xiuping Jia
Remote Sens. 2022, 14(12), 2859; https://0-doi-org.brum.beds.ac.uk/10.3390/rs14122859 - 15 Jun 2022
Viewed by 1413
Abstract
Anomaly detection of hyperspectral imagery (HSI) identifies the very few samples that do not conform to an intricate background without priors. Despite the extensive success of hyperspectral interpretation techniques based on generative adversarial networks (GANs), applying trained GAN models to hyperspectral anomaly detection [...] Read more.
Anomaly detection of hyperspectral imagery (HSI) identifies the very few samples that do not conform to an intricate background without priors. Despite the extensive success of hyperspectral interpretation techniques based on generative adversarial networks (GANs), applying trained GAN models to hyperspectral anomaly detection remains promising but challenging. Previous generative models can accurately learn the complex background distribution of HSI and typically convert the high-dimensional data back to the latent space to extract features to detect anomalies. However, both background modeling and feature-extraction methods can be improved to become ideal in terms of the modeling power and reconstruction consistency capability. In this work, we present a multi-prior-based network (MPN) to incorporate the well-trained GANs as effective priors to a general anomaly-detection task. In particular, we introduce multi-scale covariance maps (MCMs) of precise second-order statistics to construct multi-scale priors. The MCM strategy implicitly bridges the spectral- and spatial-specific information and fully represents multi-scale, enhanced information. Thus, we reliably and adaptively estimate the HSI label to alleviate the problem of insufficient priors. Moreover, the twin least-square loss is imposed to improve the generative ability and training stability in feature and image domains, as well as to overcome the gradient vanishing problem. Last but not least, the network, enforced with a new anomaly rejection loss, establishes a pure and discriminative background estimation. Full article
Show Figures

Graphical abstract

Back to TopTop