remotesensing-logo

Journal Browser

Journal Browser

GeoAI: Integration of Artificial Intelligence, Machine Learning and Deep Learning with Remote Sensing

A special issue of Remote Sensing (ISSN 2072-4292). This special issue belongs to the section "AI Remote Sensing".

Deadline for manuscript submissions: closed (9 March 2021) | Viewed by 72927

Special Issue Editors


E-Mail Website
Guest Editor
Department of Geoinformatics, University of Salzburg, 5020 Salzburg, Austria
Interests: artificial intelligence for remote sensing (AI4RS); artificial intelligence for natural hazards (AI4NH); land surface monitoring and change detection
Special Issues, Collections and Topics in MDPI journals

grade E-Mail Website
Guest Editor

Special Issue Information

Dear Colleagues,

This Special Issue focuses on advancements and innovative methods and solutions of Artificial Intelligence (AI) in remote sensing (RS) and Earth observation (EO). In particular, we call for contributions that describe methods and ongoing research, including algorithm development, data training strategies, and implementations. 

Recent advancements in hardware and high-performance computing platforms have resulted in developing and implementing several state-of-the-art machine learning approaches (e.g., decision tree learning, reinforcement learning, inductive logic programming, Bayesian networks, and clustering) that can be applied for conducting satellite image analyses. In particular, deep-learning methods have become a fast-growing trend in RS applications and, above all, supervised deep convolutional neural networks have attracted a lot of interest in the computer vision and image processing communities.

These developments are triggered by an increasing need to mine the large amount of data generated by a new generation of satellites, including, for example, the European Copernicus system with its Sentinel satellites or the many satellites that were recently launched in China. The amount of data generated today almost necessitates the use of AI for the exploration of big data.

Still, many AI algorithms are in their infancy regarding a scientific explanation. For instance, the construction of CNNs is often done in a trial-and-error manner. How many layers should really be used? Researchers have access to a massive pool of a wide range of AI algorithms, but AI needs to be used together with physical principles and scientific interpretation.

This Special Issue seeks to clarify how AI methods can be selected and used in a way that they make them practicable and appropriate for RS applications. The performance of these choices may depend on the application case, the theory behind the AI algorithms, and how algorithms and AI architectures are developed and trained. Moreover, the capabilities of novel and hybrid AI algorithms have not yet been sufficiently investigated equally in different fields. There is a need to determine the performance of standalone and hybrid approaches in satellite image analysis.

To highlight new solutions of AI algorithms for RS image understanding tasks and problems, manuscript submissions are encouraged from a broad range of related topics, which may include but are not limited to the following activities:

- Big data

- Data fusion

- Satellite images

- Image processing and classification

- Superpixels

- Multiscale and multisensor data calibration

- The hierarchical feature learning process

- Data augmentation strategies

- Feature representation

- Patch-wise semantic segmentation

- Data processing from UAVs

- Hyperspectral imageries

- Scale issues and hierarchical analysis

- Scale parameter estimation

- Training/testing data collection

- Multiresolution segmentation

- Semantic segmentation

- Classifiers

- Object detection and instance segmentation

- Change detection and monitoring

- Natural hazard monitoring and susceptibility mapping (e.g., landslide, flood, wildfire, soil erosion)

- Disaster assessment, mapping, and quantification

- Humanitarian operations

- Scene recognition

- Urban land use classification

- Land use/land cover

- Complex ecosystem dynamics, e.g., wetland and coastal mapping

- Agriculture and crop mapping

- Vegetation monitoring

- Time series analysis

Mr. Omid Ghorbanzadeh
Dr. Omid Rahmati
Prof. Thomas Blaschke
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Remote Sensing is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2700 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Remote sensing
  • Pixel-based classification
  • Object-based image analysis (OBIA)
  • Artificial intelligence
  • Machine learning
  • Deep learning
  • Convolutional neural networks (CNNs)
  • Integrated architectures

Published Papers (11 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

22 pages, 10431 KiB  
Article
Semi-Supervised Multi-Temporal Deep Representation Fusion Network for Landslide Mapping from Aerial Orthophotos
by Xiaokang Zhang, Man-On Pun and Ming Liu
Remote Sens. 2021, 13(4), 548; https://0-doi-org.brum.beds.ac.uk/10.3390/rs13040548 - 03 Feb 2021
Cited by 14 | Viewed by 2405
Abstract
Using remote sensing techniques to monitor landslides and their resultant land cover changes is fundamentally important for risk assessment and hazard prevention. Despite enormous efforts in developing intelligent landslide mapping (LM) approaches, LM remains challenging owing to high spectral heterogeneity of very-high-resolution (VHR) [...] Read more.
Using remote sensing techniques to monitor landslides and their resultant land cover changes is fundamentally important for risk assessment and hazard prevention. Despite enormous efforts in developing intelligent landslide mapping (LM) approaches, LM remains challenging owing to high spectral heterogeneity of very-high-resolution (VHR) images and the daunting labeling efforts. To this end, a deep learning model based on semi-supervised multi-temporal deep representation fusion network, namely SMDRF-Net, is proposed for reliable and efficient LM. In comparison with previous methods, the SMDRF-Net possesses three distinct properties. (1) Unsupervised deep representation learning at the pixel- and object-level is performed by transfer learning using the Wasserstein generative adversarial network with gradient penalty to learn discriminative deep features and retain precise outlines of landslide objects in the high-level feature space. (2) Attention-based adaptive fusion of multi-temporal and multi-level deep representations is developed to exploit the spatio-temporal dependencies of deep representations and enhance the feature representation capability of the network. (3) The network is optimized using limited samples with pseudo-labels that are automatically generated based on a comprehensive uncertainty index. Experimental results from the analysis of VHR aerial orthophotos demonstrate the reliability and robustness of the proposed approach for LM in comparison with state-of-the-art methods. Full article
Show Figures

Graphical abstract

25 pages, 7036 KiB  
Article
Landscape Similarity Analysis Using Texture Encoded Deep-Learning Features on Unclassified Remote Sensing Imagery
by Karim Malik and Colin Robertson
Remote Sens. 2021, 13(3), 492; https://0-doi-org.brum.beds.ac.uk/10.3390/rs13030492 - 30 Jan 2021
Cited by 7 | Viewed by 4330
Abstract
Convolutional neural networks (CNNs) are known for their ability to learn shape and texture descriptors useful for object detection, pattern recognition, and classification problems. Deeper layer filters of CNN generally learn global image information vital for whole-scene or object discrimination. In landscape pattern [...] Read more.
Convolutional neural networks (CNNs) are known for their ability to learn shape and texture descriptors useful for object detection, pattern recognition, and classification problems. Deeper layer filters of CNN generally learn global image information vital for whole-scene or object discrimination. In landscape pattern comparison, however, dense localized information encoded in shallow layers can contain discriminative information for characterizing changes across image local regions but are often lost in the deeper and non-spatial fully connected layers. Such localized features hold potential for identifying, as well as characterizing, process–pattern change across space and time. In this paper, we propose a simple yet effective texture-based CNN (Tex-CNN) via a feature concatenation framework which results in capturing and learning texture descriptors. The traditional CNN architecture was adopted as a baseline for assessing the performance of Tex-CNN. We utilized 75% and 25% of the image data for model training and validation, respectively. To test the models’ generalization, we used a separate set of imagery from the Aerial Imagery Dataset (AID) and Sentinel-2 for model development and independent validation. The classical CNN and the Tex-CNN classification accuracies in the AID were 91.67% and 96.33%, respectively. Tex-CNN accuracy was either on par with or outcompeted state-of-the-art methods. Independent validation on Sentinel-2 data had good performance for most scene types but had difficulty discriminating farm scenes, likely due to geometric generalization of discriminative features at the coarser scale. In both datasets, the Tex-CNN outperformed the classical CNN architecture. Using the Tex-CNN, gradient-based spatial attention maps (feature maps) which contain discriminative pattern information are extracted and subsequently employed for mapping landscape similarity. To enhance the discriminative capacity of the feature maps, we further perform spatial filtering, using PCA and select eigen maps with the top eigen values. We show that CNN feature maps provide descriptors capable of characterizing and quantifying landscape (dis)similarity. Using the feature maps histogram of oriented gradient vectors and computing their Earth Movers Distances, our method effectively identified similar landscape types with over 60% of target-reference scene comparisons showing smaller Earth Movers Distance (EMD) (e.g., 0.01), while different landscape types tended to show large EMD (e.g., 0.05) in the benchmark AID. We hope this proposal will inspire further research into the use of CNN layer feature maps in landscape similarity assessment, as well as in change detection. Full article
Show Figures

Graphical abstract

18 pages, 56579 KiB  
Article
Raindrop-Aware GAN: Unsupervised Learning for Raindrop-Contaminated Coastal Video Enhancement
by Jinah Kim, Dong Huh, Taekyung Kim, Jaeil Kim, Jeseon Yoo and Jae-Seol Shim
Remote Sens. 2020, 12(20), 3461; https://0-doi-org.brum.beds.ac.uk/10.3390/rs12203461 - 21 Oct 2020
Cited by 3 | Viewed by 3164
Abstract
We propose an unsupervised network with adversarial learning, the Raindrop-aware GAN, which enhances the quality of coastal video images contaminated by raindrops. Raindrop removal from coastal videos faces two main difficulties: converting the degraded image into a clean one by visually removing [...] Read more.
We propose an unsupervised network with adversarial learning, the Raindrop-aware GAN, which enhances the quality of coastal video images contaminated by raindrops. Raindrop removal from coastal videos faces two main difficulties: converting the degraded image into a clean one by visually removing the raindrops, and restoring the background coastal wave information in the raindrop regions. The components of the proposed network—a generator and a discriminator for adversarial learning—are trained on unpaired images degraded by raindrops and clean images free from raindrops. By creating raindrop masks and background-restored images, the generator restores the background information in the raindrop regions alone, preserving the input as much as possible. The proposed network was trained and tested on an open-access dataset and directly collected dataset from the coastal area. It was then evaluated by three metrics: the peak signal-to-noise ratio, structural similarity, and a naturalness-quality evaluator. The indices of metrics are 8.2% (+2.012), 0.2% (+0.002), and 1.6% (−0.196) better than the state-of-the-art method, respectively. In the visual assessment of the enhanced video image quality, our method better restored the image patterns of steep wave crests and breaking than the other methods. In both quantitative and qualitative experiments, the proposed method more effectively removed the raindrops in coastal video and recovered the damaged background wave information than state-of-the-art methods. Full article
Show Figures

Graphical abstract

21 pages, 4398 KiB  
Article
A Hybrid Data Balancing Method for Classification of Imbalanced Training Data within Google Earth Engine: Case Studies from Mountainous Regions
by Amin Naboureh, Ainong Li, Jinhu Bian, Guangbin Lei and Meisam Amani
Remote Sens. 2020, 12(20), 3301; https://0-doi-org.brum.beds.ac.uk/10.3390/rs12203301 - 11 Oct 2020
Cited by 21 | Viewed by 3596
Abstract
Distribution of Land Cover (LC) classes is mostly imbalanced with some majority LC classes dominating against minority classes in mountainous areas. Although standard Machine Learning (ML) classifiers can achieve high accuracies for majority classes, they largely fail to provide reasonable accuracies for minority [...] Read more.
Distribution of Land Cover (LC) classes is mostly imbalanced with some majority LC classes dominating against minority classes in mountainous areas. Although standard Machine Learning (ML) classifiers can achieve high accuracies for majority classes, they largely fail to provide reasonable accuracies for minority classes. This is mainly due to the class imbalance problem. In this study, a hybrid data balancing method, called the Partial Random Over-Sampling and Random Under-Sampling (PROSRUS), was proposed to resolve the class imbalance issue. Unlike most data balancing techniques which seek to fully balance datasets, PROSRUS uses a partial balancing approach with hundreds of fractions for majority and minority classes to balance datasets. For this, time-series of Landsat-8 and SRTM topographic data along with various spectral indices and topographic data were used over three mountainous sites within the Google Earth Engine (GEE) cloud platform. It was observed that PROSRUS had better performance than several other balancing methods and increased the accuracy of minority classes without a reduction in overall classification accuracy. Furthermore, adopting complementary information, particularly topographic data, considerably increased the accuracy of minority classes in mountainous areas. Finally, the obtained results from PROSRUS indicated that every imbalanced dataset requires a specific fraction(s) for addressing the class imbalance problem, because different datasets contain various characteristics. Full article
Show Figures

Graphical abstract

25 pages, 18306 KiB  
Article
A Comparison of Three Temporal Smoothing Algorithms to Improve Land Cover Classification: A Case Study from NEPAL
by Nishanta Khanal, Mir Abdul Matin, Kabir Uddin, Ate Poortinga, Farrukh Chishtie, Karis Tenneson and David Saah
Remote Sens. 2020, 12(18), 2888; https://0-doi-org.brum.beds.ac.uk/10.3390/rs12182888 - 06 Sep 2020
Cited by 22 | Viewed by 7507
Abstract
Time series land cover data statistics often fluctuate abruptly due to seasonal impact and other noise in the input image. Temporal smoothing techniques are used to reduce the noise in time series data used in land cover mapping. The effects of smoothing may [...] Read more.
Time series land cover data statistics often fluctuate abruptly due to seasonal impact and other noise in the input image. Temporal smoothing techniques are used to reduce the noise in time series data used in land cover mapping. The effects of smoothing may vary based on the smoothing method and land cover category. In this study, we compared the performance of Fourier transformation smoothing, Whittaker smoother and Linear-Fit averaging smoother on Landsat 5, 7 and 8 based yearly composites to classify land cover in Province No. 1 of Nepal. The performance of each smoother was tested based on whether it was applied on image composites or on land cover primitives generated using the random forest machine learning method. The land cover data used in the study was from the years 2000 to 2018. Probability distribution was examined to check the quality of primitives and accuracy of the final land cover maps were accessed. The best results were found for the Whittaker smoothing for stable classes and Fourier smoothing for other classes. The results also show that classification using a properly selected smoothing algorithm outperforms a classification based on its unsmoothed data set. The final land cover generated by combining the best results obtained from different smoothing approaches increased our overall land cover map accuracy from 79.18% to 83.44%. This study shows that smoothing can result in a substantial increase in the quality of the results and that the smoothing approach should be carefully considered for each land cover class. Full article
Show Figures

Graphical abstract

19 pages, 4188 KiB  
Article
Compact Cloud Detection with Bidirectional Self-Attention Knowledge Distillation
by Yajie Chai, Kun Fu, Xian Sun, Wenhui Diao, Zhiyuan Yan, Yingchao Feng and Lei Wang
Remote Sens. 2020, 12(17), 2770; https://0-doi-org.brum.beds.ac.uk/10.3390/rs12172770 - 26 Aug 2020
Cited by 16 | Viewed by 3057
Abstract
The deep convolutional neural network has made significant progress in cloud detection. However, the compromise between having a compact model and high accuracy has always been a challenging task in cloud detection for large-scale remote sensing imagery. A promising method to tackle this [...] Read more.
The deep convolutional neural network has made significant progress in cloud detection. However, the compromise between having a compact model and high accuracy has always been a challenging task in cloud detection for large-scale remote sensing imagery. A promising method to tackle this problem is knowledge distillation, which usually lets the compact model mimic the cumbersome model’s output to get better generalization. However, vanilla knowledge distillation methods cannot properly distill the characteristics of clouds in remote sensing images. In this paper, we propose a novel self-attention knowledge distillation approach for compact and accurate cloud detection, named Bidirectional Self-Attention Distillation (Bi-SAD). Bi-SAD lets a model learn from itself without adding additional parameters or supervision. With bidirectional layer-wise features learning, the model can get a better representation of the cloud’s textural information and semantic information, so that the cloud’s boundaries become more detailed and the predictions become more reliable. Experiments on a dataset acquired by GaoFen-1 satellite show that our Bi-SAD has a great balance between compactness and accuracy, and outperforms vanilla distillation methods. Compared with state-of-the-art cloud detection models, the parameter size and FLOPs are reduced by 100 times and 400 times, respectively, with a small drop in accuracy. Full article
Show Figures

Graphical abstract

24 pages, 8764 KiB  
Article
Multi-Hazard Exposure Mapping Using Machine Learning for the State of Salzburg, Austria
by Thimmaiah Gudiyangada Nachappa, Omid Ghorbanzadeh, Khalil Gholamnia and Thomas Blaschke
Remote Sens. 2020, 12(17), 2757; https://0-doi-org.brum.beds.ac.uk/10.3390/rs12172757 - 25 Aug 2020
Cited by 47 | Viewed by 6687
Abstract
We live in a sphere that has unpredictable and multifaceted landscapes that make the risk arising from several incidences that are omnipresent. Floods and landslides are widespread and recurring hazards occurring at an alarming rate in recent years. The importance of this study [...] Read more.
We live in a sphere that has unpredictable and multifaceted landscapes that make the risk arising from several incidences that are omnipresent. Floods and landslides are widespread and recurring hazards occurring at an alarming rate in recent years. The importance of this study is to produce multi-hazard exposure maps for flooding and landslides for the federal State of Salzburg, Austria, using the selected machine learning (ML) approach of support vector machine (SVM) and random forest (RF). Multi-hazard exposure maps were established on thirteen influencing factors for flood and landslides such as elevation, slope, aspect, topographic wetness index (TWI), stream power index (SPI), normalized difference vegetation index (NDVI), geology, lithology, rainfall, land cover, distance to roads, distance to faults, and distance to drainage. We classified the inventory data for flood and landslide into training and validation with the widely used splitting ratio, where 70% of the locations are used for training, and 30% are used for validation. The accuracy assessment of the exposure maps was derived through ROC (receiver operating curve) and R-Index (relative density). RF yielded better results for both flood and landslide exposure with 0.87 for flood and 0.90 for landslides compared to 0.87 for flood and 0.89 for landslides using SVM. However, the multi-hazard exposure map for the State of Salzburg derived through RF and SVM provides the planners and managers to plan better for risk regions affected by both floods and landslides. Full article
Show Figures

Graphical abstract

17 pages, 5685 KiB  
Article
Ship-Iceberg Classification in SAR and Multispectral Satellite Images with Neural Networks
by Henning Heiselberg
Remote Sens. 2020, 12(15), 2353; https://0-doi-org.brum.beds.ac.uk/10.3390/rs12152353 - 22 Jul 2020
Cited by 19 | Viewed by 3437
Abstract
Classification of ships and icebergs in the Arctic in satellite images is an important problem. We study how to train deep neural networks for improving the discrimination of ships and icebergs in multispectral satellite images. We also analyze synthetic-aperture radar (SAR) images for [...] Read more.
Classification of ships and icebergs in the Arctic in satellite images is an important problem. We study how to train deep neural networks for improving the discrimination of ships and icebergs in multispectral satellite images. We also analyze synthetic-aperture radar (SAR) images for comparison. The annotated datasets of ships and icebergs are collected from multispectral Sentinel-2 data and taken from the C-CORE dataset of Sentinel-1 SAR images. Convolutional Neural Networks with a range of hyperparameters are tested and optimized. Classification accuracies are considerably better for deep neural networks than for support vector machines. Deeper neural nets improve the accuracy per epoch but at the cost of longer processing time. Extending the datasets with semi-supervised data from Greenland improves the accuracy considerably whereas data augmentation by rotating and flipping the images has little effect. The resulting classification accuracies for ships and icebergs are 86% for the SAR data and 96% for the MSI data due to the better resolution and more multispectral bands. The size and quality of the datasets are essential for training the deep neural networks, and methods to improve them are discussed. The reduced false alarm rates and exploitation of multisensory data are important for Arctic search and rescue services. Full article
Show Figures

Graphical abstract

22 pages, 17604 KiB  
Article
Multi-Scale and Occlusion Aware Network for Vehicle Detection and Segmentation on UAV Aerial Images
by Wang Zhang, Chunsheng Liu, Faliang Chang and Ye Song
Remote Sens. 2020, 12(11), 1760; https://0-doi-org.brum.beds.ac.uk/10.3390/rs12111760 - 29 May 2020
Cited by 29 | Viewed by 4693
Abstract
With the advantage of high maneuverability, Unmanned Aerial Vehicles (UAVs) have been widely deployed in vehicle monitoring and controlling. However, processing the images captured by UAV for the extracting vehicle information is hindered by some challenges including arbitrary orientations, huge scale variations and [...] Read more.
With the advantage of high maneuverability, Unmanned Aerial Vehicles (UAVs) have been widely deployed in vehicle monitoring and controlling. However, processing the images captured by UAV for the extracting vehicle information is hindered by some challenges including arbitrary orientations, huge scale variations and partial occlusion. In seeking to address these challenges, we propose a novel Multi-Scale and Occlusion Aware Network (MSOA-Net) for UAV based vehicle segmentation, which consists of two parts including a Multi-Scale Feature Adaptive Fusion Network (MSFAF-Net) and a Regional Attention based Triple Head Network (RATH-Net). In MSFAF-Net, a self-adaptive feature fusion module is proposed, which can adaptively aggregate hierarchical feature maps from multiple levels to help Feature Pyramid Network (FPN) deal with the scale change of vehicles. The RATH-Net with a self-attention mechanism is proposed to guide the location-sensitive sub-networks to enhance the vehicle of interest and suppress background noise caused by occlusions. In this study, we release a large comprehensive UAV based vehicle segmentation dataset (UVSD), which is the first public dataset for UAV based vehicle detection and segmentation. Experiments are conducted on the challenging UVSD dataset. Experimental results show that the proposed method is efficient in detecting and segmenting vehicles, and outperforms the compared state-of-the-art works. Full article
Show Figures

Graphical abstract

Review

Jump to: Research

33 pages, 29791 KiB  
Review
Geographic Object-Based Image Analysis: A Primer and Future Directions
by Maja Kucharczyk, Geoffrey J. Hay, Salar Ghaffarian and Chris H. Hugenholtz
Remote Sens. 2020, 12(12), 2012; https://0-doi-org.brum.beds.ac.uk/10.3390/rs12122012 - 23 Jun 2020
Cited by 52 | Viewed by 10167
Abstract
Geographic object-based image analysis (GEOBIA) is a remote sensing image analysis paradigm that defines and examines image-objects: groups of neighboring pixels that represent real-world geographic objects. Recent reviews have examined methodological considerations and highlighted how GEOBIA improves upon the 30+ year pixel-based approach, [...] Read more.
Geographic object-based image analysis (GEOBIA) is a remote sensing image analysis paradigm that defines and examines image-objects: groups of neighboring pixels that represent real-world geographic objects. Recent reviews have examined methodological considerations and highlighted how GEOBIA improves upon the 30+ year pixel-based approach, particularly for H-resolution imagery. However, the literature also exposes an opportunity to improve guidance on the application of GEOBIA for novice practitioners. In this paper, we describe the theoretical foundations of GEOBIA and provide a comprehensive overview of the methodological workflow, including: (i) software-specific approaches (open-source and commercial); (ii) best practices informed by research; and (iii) the current status of methodological research. Building on this foundation, we then review recent research on the convergence of GEOBIA with deep convolutional neural networks, which we suggest is a new form of GEOBIA. Specifically, we discuss general integrative approaches and offer recommendations for future research. Overall, this paper describes the past, present, and anticipated future of GEOBIA in a novice-accessible format, while providing innovation and depth to experienced practitioners. Full article
Show Figures

Graphical abstract

34 pages, 2021 KiB  
Review
Review: Deep Learning on 3D Point Clouds
by Saifullahi Aminu Bello, Shangshu Yu, Cheng Wang, Jibril Muhmmad Adam and Jonathan Li
Remote Sens. 2020, 12(11), 1729; https://0-doi-org.brum.beds.ac.uk/10.3390/rs12111729 - 28 May 2020
Cited by 187 | Viewed by 19957
Abstract
A point cloud is a set of points defined in a 3D metric space. Point clouds have become one of the most significant data formats for 3D representation and are gaining increased popularity as a result of the increased availability of acquisition devices, [...] Read more.
A point cloud is a set of points defined in a 3D metric space. Point clouds have become one of the most significant data formats for 3D representation and are gaining increased popularity as a result of the increased availability of acquisition devices, as well as seeing increased application in areas such as robotics, autonomous driving, and augmented and virtual reality. Deep learning is now the most powerful tool for data processing in computer vision and is becoming the most preferred technique for tasks such as classification, segmentation, and detection. While deep learning techniques are mainly applied to data with a structured grid, the point cloud, on the other hand, is unstructured. The unstructuredness of point clouds makes the use of deep learning for its direct processing very challenging. This paper contains a review of the recent state-of-the-art deep learning techniques, mainly focusing on raw point cloud data. The initial work on deep learning directly with raw point cloud data did not model local regions; therefore, subsequent approaches model local regions through sampling and grouping. More recently, several approaches have been proposed that not only model the local regions but also explore the correlation between points in the local regions. From the survey, we conclude that approaches that model local regions and take into account the correlation between points in the local regions perform better. Contrary to existing reviews, this paper provides a general structure for learning with raw point clouds, and various methods were compared based on the general structure. This work also introduces the popular 3D point cloud benchmark datasets and discusses the application of deep learning in popular 3D vision tasks, including classification, segmentation, and detection. Full article
Show Figures

Graphical abstract

Back to TopTop