Next Article in Journal
Earth Observation from KOMPSAT Optical, Thermal, and Radar Satellite Images
Previous Article in Journal
Advection of Biomass Burning Aerosols towards the Southern Hemispheric Mid-Latitude Station of Punta Arenas as Observed with Multiwavelength Polarization Raman Lidar
 
 
Erratum published on 3 February 2021, see Remote Sens. 2021, 13(4), 539.
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Comparison of Masking Algorithms for Sentinel-2 Imagery

1
DLR, German Aerospace Center, D-82234 Wessling, Germany
2
DLR, German Aerospace Center, D-12489 Berlin, Germany
3
Telespazio France, 31023 Toulouse, France
4
Earth Observation Lab, Geography Department, Humboldt-Universität zu Berlin, D-10099 Berlin, Germany
*
Author to whom correspondence should be addressed.
Submission received: 1 December 2020 / Revised: 25 December 2020 / Accepted: 27 December 2020 / Published: 4 January 2021
(This article belongs to the Section Atmospheric Remote Sensing)

Abstract

:
Masking of clouds, cloud shadow, water and snow/ice in optical satellite imagery is an important step in automated processing chains. We compare the performance of the masking provided by Fmask (“Function of mask” implemented in FORCE), ATCOR (“Atmospheric Correction”) and Sen2Cor (“Sentinel-2 Correction”) on a set of 20 Sentinel-2 scenes distributed over the globe covering a wide variety of environments and climates. All three methods use rules based on physical properties (Top of Atmosphere Reflectance, TOA) to separate clear pixels from potential cloud pixels, but they use different rules and class-specific thresholds. The methods can yield different results because of different definitions of the dilation buffer size for the classes cloud, cloud shadow and snow. Classification results are compared to the assessment of an expert human interpreter using at least 50 polygons per class randomly selected for each image. The class assignment of the human interpreter is considered as reference or “truth”. The interpreter carefully assigned a class label based on the visual assessment of the true color and infrared false color images and additionally on the bottom of atmosphere (BOA) reflectance spectra. The most important part of the comparison is done for the difference area of the three classifications considered. This is the part of the classification images where the results of Fmask, ATCOR and Sen2Cor disagree. Results on difference area have the advantage to show more clearly the strengths and weaknesses of a classification than results on the complete image. The overall accuracy of Fmask, ATCOR, and Sen2Cor for difference areas of the selected scenes is 45%, 56%, and 62%, respectively. User and producer accuracies are strongly class- and scene-dependent, typically varying between 30% and 90%. Comparison of the difference area is complemented by looking for the results in the area where all three classifications give the same result. Overall accuracy for that “same area” is 97% resulting in the complete classification in overall accuracy of 89%, 91% and 92% for Fmask, ATCOR and Sen2Cor respectively.

1. Introduction

The Sentinel-2 mission consists of two polar-orbiting satellites, Sentinel-2A and Sentinel-2B, providing a five day revisit time at the equator. The swath width of a Sentinel-2 scene is 290 km and data is acquired in 13 bands with spatial resolutions of 10 m, 20 m, and 60 m [1] (see Table 1). Sentinel-2 images are open access data, offer high quality radiometric measurements and include a dedicated cirrus detection band. The free data access, frequent coverage of territories, wide swath and many spectral bands are reasons for the wide-spread use of this kind of data in many applications. Satellite imagery is frequently contaminated by low and medium altitude water clouds as well as by high-altitude cirrus clouds in the upper troposphere and in the stratosphere. Many operations require clear sky pixels as input, such as agriculture related products [2,3], the retrieval of surface reflectance within atmospheric correction [4,5] and the coregistration with other images [6,7].
Atmospheric correction and land cover classification depend on an accurate cloud map [8,9,10]. In addition, maps of water and snow/ice are also indispensable in many applications, i.e., mapping of glaciers [11] and water bodies [12].
Cloud screening is applied to the data in order to retrieve accurate atmospheric and surface parameters as input for further processing steps, either the Atmospheric Correction (AC) itself or higher-level processing such as compositing, time-series analysis or estimation of biogeophysical parameters.
However, a fully automatic detection of these classes is not an easy task, due to the high reflectance variability of earth surfaces. For instance, bright desert surfaces or urban structures can be misclassified as one or the other or as cloud and shadow surfaces as water. A class assignment for mixed pixels (e.g., semitransparent cloud over snow) can be problematic because they do not have a spectral signature, which clearly belongs to a class. These together will decrease the classification accuracy and shows the need for a performance assessment of classification algorithms.
The Cloud Masking Intercomparison Exercise (CMIX) [13] was a recent state-of-the art intercomparison of a set of cloud detection algorithms for Sentinel-2 and Landsat-8 representative for sensors in the 10–30 m range. However, CMIX was limited to differentiate only cloudy and cloudless pixels. Reference [14] is limited to valid and invalid pixels too. Valid pixels in reference [14] are cloudless pixels like land, water and snow and invalid are clouds and cloud shadows. Cloud masks from the MACCS ATCOR Joint Atmospheric Correction (MAJA) algorithm using multi-temporal information are compared with monotemporal classification by Sen2Cor and Fmask [15]. The comparison in reference [14] is done twice: Once for cloud masks of all three processors dilated around clouds and second for all processors with nondilated cloud masks. This means that there is no comparison on original processor outputs. Overall accuracies for all three algorithms are nearby at 90–93% in case of nondilated cloud mask. Monotemporal Fmask gave equivalent classification performance as multitemporal MAJA for dilated masks and Sen2Cor was on average 6% worse on these. However, dilation of Sen2Cor cloud mask is not recommended with the used processor version because it is a known issue that it misclassifies many bright objects as clouds in urban area, which leads to commission of clouds and even more if dilation is applied. On the contrary, original masking outputs are evaluated in this paper and not only for valid and invalid pixels, but in more detail for six consolidated classes given below. This gives more insight into strengths and weaknesses of the masking algorithms.
As opposed to radiometric validation, the validation of masking is limited due to the lack of suitable reference datasets. Imaged-based reference data are required, which can only be generated through image interpretation or semiautomated methods as done in [14] CMIX is based on four classification reference data bases for Sentinel-2 data. These existing reference data are either not publicly available or do not fulfill the requirements for this study, e.g., 20 m resolution and a distinction of all defined classes.
In this study we evaluate the performance of three widely used monotemporal masking codes on Sentinel-2 imagery.
Our first masking code is Function of mask (Fmask) [16]. It was originally designed for Landsat imagery but later extended for Sentinel-2 data [15]. Here, we use the Fmask version as implemented in FORCE ([17]), which is able to separate clouds from bright surfaces exploiting parallax effects. In FORCE, the cloud masking is integrated into a processing workflow, which also includes coregistration [18], radiometric correction [19], resolution merging [20] and datacube generation [21]. The individual detectors of MSI-sensor have slightly different viewing directions alternating from forward view to backward view between adjacent detectors. The second code is the latest version of ATCOR (v 9.3.0), which contains a masking algorithm [22] as a necessary preprocessing part before starting the atmospheric correction. Masking in ATCOR 9.3.0 was improved relative to previous versions. The third is the scene classification of Sen2Cor (version 2.8.0). Sen2Cor is an atmospheric correction processor for Sentinel-2 (S2) data provided by the European Space Agency (ESA), which contains a preprocessing scene classification step preceding atmospheric correction [23]. Whereas the atmospheric correction module of Sen2Cor was developed in heritage of ATCOR, the scene classification is completely independent. Scene classification of Sen2Cor makes use of some external auxiliary data from Climate Change Initiative [24]. It is still to mention that Fmask uses a 300 m dilation buffer for cloud, and 60 m for cloud shadow, while ATCOR uses 100 m and 220 m, respectively, and Sen2Cor (version 2.8.0) uses no dilation buffers. Fmask applies also a 1 pixel buffer for snow. The reader is referred to the given references for a detailed description of the three methods and the different threshold values used, because it is outside the scope of this paper.
This paper is organized as follows: Section 2 presents an overview over the S2 scenes used for the exercise. Section 3 describes the approach to define the reference (“truth”) mask (validation procedure). Section 4 presents the classification results in terms of user’s, producer’s and overall accuracy [25], and Section 5 provides a discussion of the critical issues. The conclusion and possible further improvements are given at the end of the paper.

2. Methods (Processors) and Data

Twenty S2 scenes are processed with the three codes. A list of the investigated Sentinel-2 scenes is given in Table 2. The scenes were selected to cover all continents, different climates, seasons, weather conditions, and land cover classes (Figure 1). They represent flat and mountainous sites with cloud cover from 1% to 62% and include the presence of cumulus, thin and thick cirrus clouds and snow cover. Additionally, the scenes represent different land cover types such as desert, urban, cropland, grass, forest, wetlands, sand, coastal areas and glaciers. The range of solar zenith angles is from 18 to 62 . For the scene classification validation, all S2 bands with 10 m and 60 m are resampled to a common 20 m pixel size. All processors used Digital Elevation Models (DEMs) usually from SRTM (90 m) (downloaded from the USGS website (https://earthexplorer.usgs.gov/)) except for the scenes number 1, 6 and 16, which used Planet DEM [26].
The classification of ATCOR provides a map with 22 classes which is used in the subsequent atmospheric correction module [22]. For this investigation, a compact map with seven classes (clear, semitransparent cloud, cloud, cloud shadow, water, snow/ice, topographic shadow) is derived from the detailed map at 20 m spatial resolution. A potential shadow mask is defined as reference shadow and the cloud height of the cloud mask is iterated until the projected cloud mask for the given solar geometry matches the shadow mask. Topographic shadow is calculated with a ray tracing algorithm using the DEM and the solar geometry. The classes “cloud shadow” and “water” are often difficult to distinguish and in case of cloud shadow over water the class assignment is arbitrary. Therefore, misclassifications can happen, because only one label can be assigned in this method. Semitransparent cloud can be thin cirrus or another cloud type of low optical thickness.
The SCL algorithm of Sen2Cor aims to detect clouds with their shadows and to generate a scene classification map. The latter raster map consists of 12 classes, including 2 classes for cloud probabilities (medium, and high), thin cirrus, cloud shadows, vegetated pixels, nonvegetated pixels, water, snow, dark feature pixels, unclassified, saturated or defective pixels and no data. This map is used internally in Sen2Cor in the atmospheric correction module to distinguish between cloudy pixels, clear land pixels and water pixels, and it does not constitute a land cover classification map in a strict sense [27]. The scene classification map is delivered at 60 m and 20 m spatial resolution, with associated Quality Indicators (QI) for cloud and snow probabilities. The QIs provide the probability measure (0–100%) that the Earth surface is obstructed either by clouds or by snow. Class dark area pixels can contain dark features like burned area, topographic shadows or cast shadows but also very dark water bodies and vegetation. Thin cirrus may also be other transparent cloud and the transition from medium to high probability cloud is impossible to validate. Pixels assigned to unclassified are mostly pixels with low probability of clouds or mixed pixels, which do not fit into any of the other classes.

3. Validation Procedure

Validation of masks comprises verification of the mask classification accuracy to clarify uncertainties of masking products for their applications. Comparison of different mask classification algorithms requires first to map all the individual masking outputs to a common set of labels. Table 3 shows the seven classes used as a common set for Fmask, ATCOR and Sen2Cor.
Semitransparent cloud is defined as optically thin cirrus cloud, thin lower altitude cloud, haze or smoke. To detect thin cirrus clouds, the reference mask generation use the TOA reflectance in the cirrus band 10, lying below 0.04 but above 0.01. The lower threshold is used to avoid classifying all pixels as semitransparent. The label cloud comprises optically thick (opaque) water cloud and also cirrus cloud with ρ (TOA, band 10) > 0.04.
The focus of the present paper is not only validation of the scene classification provided by the three processors but its comparison. This comparison is done by generating two reference maps which complement each other—one for the “difference area” and another for the “same area”. The difference area is the part of the classification images where the classification maps provided by the three processors disagree. Validation statistics over the difference area enable a relative comparison between processors pointing on strengths and weaknesses much sharper than interpreting statistics over an entire image, which are often fairly similar. The same area is the remaining part of the images where all three classifications give the same result. Combination of the validation statistics over the same area and disagreement area enables to assess the absolute classification performance of the processors. This requires that the ratio of labeled pixels in the difference area to the labeled pixels in the same area is the same as the ratio of the size of disagreement area to size of agreement area. The challenge for validation of SCL is generation of high quality reference maps which gives the "truth". Generation of the reference maps for the performed comparison of Fmask, ATCOR and Sen2Cor outputs relies on visual inspection, supplemented by meteorological data, if available. The following procedure was repeated for each image of the validation dataset.
First, stratified random sampling [25] is applied to the difference mask between three processors to get the sample points for visual labeling. Stratification serves to get the sample size balanced between all classes present in the image, thus to guarantee statistical consistency and to avoid exclusion of spatially limited classes from the validation. Our aim is an amount of 1000 randomly selected samples per image with the minimum number of 50 samples for the smallest class (for reference please see following authors: [28,29,30,31]). Visual inspection by human interpreter results in labeling of either one pixel only or alternatively labeling a polygon drawn around an adjacent area of pixels of the same class to assign the correct class and create the reference (“truth”) map. All labeled pixels are used to create the reference classification image typically resulting in an average number of 5000 pixels per scene. Figure 2 presents an overview on the generation of the classification reference mask. It begins (left part) with selected L1C channel combinations (4-3-2; 8A-6-5; 10; spectral TOA reflectance profiles, etc.), continues with the consolidation, stratified random sampling and visual labeling to create the reference image. This image (right part) is masked and compared to the consolidated images to obtain the corresponding pixels of the classification images and perform the accuracy assessment.
Visual inspection by the expert human interpreter was supported by:
  • Visual checks of the TOA true color image (bands 4, 3, 2), TOA near infrared false color (bands 8A, 6, 5), and TOA short-wave infrared false color (bands 12, 11, 8A).
  • Check of L1C cirrus (band 10) concerning semitransparent cirrus regions.
  • Check of BOA reflectance spectral profiles from Level-2 Sen2Cor products.
  • Comparison with imagery archive from GoogleEarth T M .
The created reference classification map is finally compared to the consolidated classification maps from Fmask, ATCOR and Sen2Cor and a confusion matrix is obtained for each classification. Finally, classification accuracy statistics are computed from confusion matrices. After completing analysis for disagreement area, the same procedure is repeated for the same area to allow computation of absolute classification accuracy statistics of the three classifications.
Figure 3 shows an example of a true color (RGB = Red, Green, Blue = bands 4, 3, 1) composite of scene 19 (Davos) of Table 2, a false color composite using RGB (SWIR1, NIR, red) and some typical BOA reflectance spectra of snow, clouds, clear (vegetation) and water. Obviously, snow/ice and clouds cannot easily be discerned in the true color image. Therefore, the human interpreter also uses other band combinations, in this case with band 11 (SWIR1), where snow/ice (colored blue) is clearly recognized. In addition, BOA reflectance spectra are evaluated for a polygon if a class assignment is not obvious.
The procedure applied for generation of our reference classification map is similar to the way used to create the references for the Hollstein and PixBox datasets [10]. The new point is that we split the validation into creating a reference for the difference area and another for the same area for comparison of classification tools. Please note that obtained reference maps are not perfect. The manual labeling includes some amount of subjectivity. Most of all visual interpretation and labeling of transparent clouds is challenging. Subjectivity of the method was tested with four people, creating a reference map for the same two products. The test revealed quite stable results with 5–6% differences in overall accuracy (OA) using the reference maps for computation of classification accuracy statistics. Another limitation of our classification reference maps comes from the stratified random sampling. The stratification between classes has to be oriented itself on one classification, which was Sen2Cor in our case. If the Sen2Cor classification fails, then the reference map becomes imbalanced. Even if this is not the case, then the reference maps are not perfectly balanced for the other classifications. The potential bias could be investigated by creating another stratified random sampling based on a different set, but such a sensitivity study is outside the scope of this paper.
Classification accuracy statistics is represented by three parameters calculated from the error confusion matrix [25,29,30]. If the number of classes is n, then the confusion matrix C is a n × n matrix, and the user’s accuracy of class i (percentage of the area mapped as class i that has reference class i) is defined as
U A ( i ) = 100 C i i / j = 1 n C ( i , j ) j = c o l u m n n u m b e r
The second parameter is the producer’s accuracy of class i (percentage of the area of reference class i that is mapped as class i)
P A ( i ) = 100 C i i / j = 1 n C ( j , i ) j = r o w n u m b e r
The last is the overall accuracy:
O A = 100 j = 1 n C ( j , j ) i = 1 n j = 1 n C ( i , j )
The OA can be calculated for the total area of an image, i.e., the absolute OA but also for the difference and same area of each scene and masking code.
Besides the OA, UA and PA measures, a detailed visual inspection supported the analysis of the confusion within and between classes per processor. Comparison was performed per processor and class over difference area, including recognition rates, misclassification rates of particular class as well as its confusion potential with other classes (the proportion of one mistaken by other class).

4. Results

Validation results consist of confusion matrix with the number of correctly classified pixels in the validation set. Confusion matrix is the basis for computation of UA and PA and OA of classification. Table 4 provides results for difference area and shows a summary of the UA and PA per class, i.e., the average over all 20 scenes. Table 5 provides results for absolute validation of classifications comparable to results present in the literature and contains the OA per scene. Boldface numbers indicate the method with the best performance, but if the values differ less than about 1% then two methods are marked correspondingly.
For space reasons, we cannot present detailed results for each scene. The example of scene 4 (Bolivia, Puerto Siles) in Figure 4 serves as an example to demonstrate the difference mask validation. The image contains no clouds but water with different color and sediment, bright soil and burned area. This image is the example with the smallest difference area rather than the one with the largest agreement between Fmask, ATCOR and Sen2Cor classifications. This is also underlined with high absolute OA over complete image of 99%, 98% and 99%. There is only a small difference between classifications in PA over complete image for class clear land with 100%, 98% and 99% representing what is visible in Figure 4—a different amount of burned area is classified as water. User accuracy of class water for the total image is 99% for all masking codes, hiding differences clearly to see in the figure. Statistics over difference area gives a much more detailed insight into classification performance. OA over difference area is 90%, 22% and 58% for Fmask, ATCOR and Sen2Cor. Differences in PA for class clear land are now more highlighted with values 97%, 18% and 57%. User accuracy of class water for the difference image now is different between Fmask and Sen2Cor with 74% resp. 80%. Whereas Fmask identifies 97% of clear pixels in the difference area as clear, ATCOR and Sen2Cor do it for less than 60% of pixels. ATCOR largely misclassifies burned area as water. The problem shown for Sen2Cor with misclassification of clear land as topographic shadow has its origin in the transformation of Sen2Cor classification outputs to the consolidated mask. Consolidated class topographic shadow corresponds to Sen2Cor class dark area, which can contain dark features like burned area, topographic shadows or cast shadows but also very dark water bodies and vegetation. A planned update of class definition of Sen2Cor class dark area to only topographic or cast shadow will solve this confusion.
To furthermore compare the classification performance of Fmask, ATCOR and Sen2Cor, details are given for three selected cases: the best and worst case scenarios and an average case.
Figure 5 shows the best case (highest absolute overall accuracy) scenario of all analyzed scenes from Table 5. It is scene number 18 from Spain (Barrax) taken on 19 March 2017 with a zenith angle of 22.0 and a azimuth angle of 143.2 . In Figure 6 subset of scene ID 18 can be found. It nicely illustrates the differences between Fmask, ATCOR and Sen2Cor. The cloud percentage is overestimated in Fmask due to mask dilation, while ATCOR and Sen2Cor classifications are very similar and close to the reference.
The overall worst case scenario (lowest absolute OA) of the 20 scenes analyzed is illustrated in Figure 7. This scene from Switzerland (Davos) was acquired on 4 April 2019 at a zenith and azimuth angle of 37.7 and 158.5 , respectively. This scene is difficult to classify correctly for all the processors due to the high reflectivity of the snow and complex topography. The snow is often misclassified as cloud. The lower overall accuracy in Fmask compared to ATCOR and Sen2Cor is again connected with the cloud dilation.
Figure 7 shows a subset of scene ID 19. As in the previous case (scene ID 18 from Spain) Fmask overestimates the percentage of cloud coverage at the expense of snow cover, which may or may not be problematic depending on the application. ATCOR and Sen2Cor show a more accurate cloud mask. An inspection of a zoom area (see Figure 8) reveals that Sen2Cor sometimes falsely classifies cloud shadows as water.
A scene showing an average case scenario (i.e., no complex topography, small percentage of cloud cover and bright objects) for all classification methods is the one from the USA (Rimrock). It was taken on 12 May 2018 at a zenith angle of 30.4 and an azimuth angle of 153.5 . Figure 9 shows the entire area of the scene with the three different classification maps, whereas Figure 10 only illustrates a subset of scene ID 20. Most of the scene is clear with some clouds and snow/ice in the southern part. Additionally, the river is accurately mapped by all processors.
The subset (Figure 10) demonstrates the difficulties Sen2Cor faces when distinguishing between urban areas or bright ground objects and clouds. ATCOR on the other hand misinterprets dark water for shadow. However, if both classes have about the same probability, then ATCOR’s preference is shadow.
As can be deduced from Table 4 for the difference area, up to 75.5% of clear pixels were correctly classified by Sen2Cor, whereas Fmask and ATCOR recognize 56.2% and 64.6% correctly. The highest share of clear pixel misclassification was found by clouds for Fmask and Sen2Cor and semitransparent clouds for ATCOR. Semitransparent clouds were recognized up to 30.4% and 28.2% for ATCOR and Sen2Cor, respectively, while the omitted pixels were mainly distributed between classes clear and clouds by ATCOR and clear and snow by Sen2Cor. Fmask only classifies 1.8% of semitransparent cloud pixels correctly and mostly missclassifies the omitted pixels as clouds. Fmask performs best for the classification of cloud pixels (84.5%), while ATCOR and Sen2Cor have a recognition rate of 62.7% and 65.7%, respectively. The highest share of the cloud omission was found by class clear for Fmask and Sen2Cor and by class cloud shadows for ATCOR. Cloud shadows have low recognition rate (27.7%) and high confusion with class clear in the case of Sen2Cor. Fmask and ATCOR have lowest recognition for the class topographic shadows with a rate of 2.2% and 4.1%, respectively. Sen2Cor performs slightly better with 53.0%. Their omission is distributed mainly between classes clear and cloud shadows. The highest recognition rates (and lowest confusion to other classes) were found for clouds (84.5%) and water (68.1%), clear (64.6%) and snow (75.7%) and water (80.3%) and snow pixels (85.7%), for Fmask, ATCOR and Sen2Cor, respectively. Surprisingly, the proportion of snow pixels being mistaken toward clouds was low for ATCOR and Sen2Cor (12% and 8%, respectively), whereas Fmask misclassifies 47%, which is because of the cloud buffer, as well as the compilation of FORCE quality bits into the scene classification as employed in this study. In original FORCE output, multiple flags can be set for one pixel, i.e., the snow and cloud flags can both be set. During the reclassification process, clouds were given highest priority, thus snow detections were overruled by buffered cloud detections.
The confusion within and between classes can be additionally illustrated using the proportion of the individual class omissions for the difference area (Figure 11, Figure 12 and Figure 13).
Figure 11 illustrates spider diagrams for the omission and commission of the classes clear land, water and snow representing valid, cloudless pixels. Fmask, ATCOR and Sen2Cor are represented by the colors green, blue and orange. Looking at the left upper plot of Figure 11, it can be noted that Fmask has a large omission of clear land towards clouds. This can be clearly attributed to cloud dilation. ATCOR has a omission of clear land to water, which is uncritical for pure cloud masking. Sen2Cor confuses most clear land pixels with topographic shadows due to the unfavorable class definition of class dark feature, which is mapped to topographic shadows. In the central lower plot of Figure 11 we see that ATCOR confuses most water pixels as clear land. All three masking codes show commission of water pixels towards the same direction of clear land but with different amounts. The commission of snow shows that Fmask and ATCOR classifies some clear as snow, which is uncritical for clear/cloud mask. Sen2Cor classifies some semitransparent clouds as snow.
Figure 12 illustrates spider diagrams for the omission and commission of the classes cloud and semitransparent cloud for difference area. The upper left image shows that ATCOR and Sen2Cor have omission of cloud pixels towards the class clear. The commission of cloud is on the other hand different for all three masking codes. Fmask shows the largest commission of cloud pixels towards clear, semitransparent cloud and cloud shadow. Sen2Cor classifies some clear and semitransparent pixels as cloud and ATCOR shows a slight commission of semitransparent pixels towards cloud. For the class semitransparent cloud, the largest omission comes from Fmask, which confuses most semitransparent clouds as cloud. This perfectly corresponds to commission of cloudy pixels towards semitransparent clouds. ATCOR and Sen2Cor show a commission of semitransparent pixels towards the class clear.
Figure 13 illustrates spider diagrams for the omission and commission for difference area of the shadow classes cloud shadows and topographic shadows. From the left upper image of Figure 13 it can be noted that Fmask has the largest omission of cloud shadow towards the class cloud. ATCOR and Sen2Cor confuse cloud shadows mostly with clear pixels. All three masking codes show a similar direction of commission of cloud shadows towards clear pixels. Except for the class definition problem of Sen2Cor for topographic shadows, the upper and lower right images show good agreement between the processors and almost perfect performance. Sen2Cor shows a large commission of topographic shadow pixels towards the class clear, water and cloud shadow due to its definition of dark pixels.

5. Discussion

Since the reference and classified maps are based on the same dataset, i.e., a perfect match of geometry and acquisition time, the main uncertainty of the reference map classification is the use of a human interpreter [29]. Experiences with similar experiments using several human analysts report an average interpretation variability of ∼5–7% [16,32] for cloud masks. In order to reduce the influence of the interpreter, a reference polygon should have homogeneous BOA reflectance properties per class, i.e., heterogeneous areas with mixed pixels are excluded [30]. The area homogeneity can be checked visually per band and it also shows if pixel spectra of a polygon have a large dispersion, e.g., for cloud border regions or snow areas below semitransparent cloud. Although the variability within a polygon should be small, large differences can exist between different polygons of the same class, e.g., in the case of different cloud types or fresh and polluted snow.
Table 4 presents the class-specific user’s accuracy (UA) and producer’s accuracy (PA) for the three methods averaged over the 20 scenes valid for difference area. High PA values (>80%) are only achieved for the classes cloud (Fmask) and snow/ice (Sen2Cor) indicating how difficult a classification is for all other classes. The low values for semitransparent cloud are most likely caused by the interpreter and his visual assessment, which does not agree with the selected physical criterion (0.01 < ρ (TOA, 1.38 μ m) < 0.04) of the three methods. Another known classification problem concerns the distinction of water and cloud shadow if no external maps are included. Both classes can be spectrally very similar. Additionally, there can be cloud shadow over water, but since a pixel can only belong to one class in our evaluation, the setting of the preference rule adds another uncertainty.
Nevertheless, a comparison with S2 classification results obtained by the standard Fmask [15] (applied to seven scenes) demonstrates that, in our investigation, all three methods yield better overall accuracies than presented in reference [15] (Figure 6). This is even more remarkable because our approach uses six classes instead of four, and an increase of the number of classes usually tends to decrease the overall accuracy. One has to consider that the spatial resolution of Sentinel-2 data is 20 m, while it is 30 m for the Landsat data of reference [15]. A better classification agreement might, at least partly, be achieved by the enhanced Sentinel-2 resolution. However, while a higher spatial resolution can help to achieve a better classification, this is mainly related to mixed pixels, and in our study heterogeneous areas with mixed pixels are excluded.
Table 5 allows a selection of the best method depending on location and cloud cover:
  • Fmask can best be applied for scenes in moderate climate, excluding arid and desert regions as well as areas with a large snow/ice coverage.
  • ATCOR can best be applied for urban (bright surfaces), arid and desert scenes.
  • Sen2Cor can best be applied for rural scenes in moderate climate and also in scenes with snow and cloud.
Again, a reminder is needed: the Fmask results shown here pertain to the Fmask parallax version [17] not the available standard version [15]. Furthermore, Sen2Cor uses an additional external ESACCI-LC data package, which improves the classification accuracy over water, urban and bare areas and enables a better handling of false detection of snow pixels [33]. Therefore, Sen2Cor benefits from a certain advantage compared to Fmask and ATCOR. During this investigation we also found out that the performance of Fmask (parallax version) can be improved if the current cloud buffer size of 300 m is reduced to 100 m. In the meantime, the size of the cloud buffer has become a user-defined parameter. The performance of Sen2Cor (version 2.8.0) can be slightly improved with an additional cloud buffer of 100 m (instead of no buffer), whereas an additional 100 m cloud buffer is almost of no influence on the ATCOR performance.
To sum up, we can say that the overall accuracy is very high for all three masking codes and nearly the same (89%, 91% and 92% for Fmask, ATCOR and Sen2Cor, respectively) and the balanced OA (OA for same area) is equal (97%). ATCOR finds most valid pixels, has the highest PA and lowest UA for valid pixels. Sen2Cor finds less valid pixels due to its class definition of dark area. Fmask finds least valid pixels due to dilation of cloud masks, thus not a randomly occurring commission. In contrast, Fmask has the lowest cloud omission and clear commission at the expense of higher cloud commission and clear omission. Depending on application, losing a higher rate of cloud-adjacent pixels may be far less severe than missing cloud pixels.

6. Conclusions

The performance of three classification methods (Fmask, parallax version), ATCOR and Sen2Cor was evaluated on a set of 20 Sentinel-2 scenes covering all continents, different climates, seasons and environments. The reference maps with seven classes (clear, semitransparent cloud, cloud, cloud shadow, water, snow/ice and topographic shadows) were created by an experienced human interpreter. The average overall accuracy for the absolute area is 89%, 91%, and 92% for Fmask, ATCOR, and Sen2Cor, respectively. High values of producer’s accuracy of the difference area (>80%) were achieved for cloud and snow/ice, and lower values for the other classes typically range between 30% and 70%. This study can serve as a guide to learn more about possible pitfalls and achieve more accurate algorithms. Future improvements for the classification algorithms could involve texture measures and convolutional neural networks.

Author Contributions

V.Z.: validation, writing of article; M.M.-K.: concept, methodology, validation; J.L.: software, writing; D.F.: software, writing; R.R.: software, writing; K.A.: software; B.P. concept, methodology, writing. All authors have read and agreed to the published version of the manuscript.

Funding

Part of this research was performed as part of the Copernicus Sentinel-2 Mission Performance Center activities, which are managed by ESA. This research received no other external funding.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Drusch, M.; Del Bello, U.; Carlier, S.; Colin, O.; Fernandez, V.; Gascon, F.; Hoersch, B.; Isola, C.; Laberinti, P.; Martimort, P.; et al. Sentinel-2: ESA’s optical high-resolution mission for GMES operational services. Remote Sens. Environ. 2012, 120, 25–36. [Google Scholar] [CrossRef]
  2. Immitzer, M.; Vuolo, F.; Atzberger, C. First experience with Sentinel-2 data for crop and tree species classifications in Central Europe. Remote Sens. 2016, 8, 166. [Google Scholar] [CrossRef]
  3. Clevers, J.G.; Gitelson, A.A. Remote estimation of crop and grass chlorophyll and nitrogen content using red-edge bands on Sentinel-2 and -3. Int. J. Appl. Earth Obs. Geoinform. 2013, 23, 344–351. [Google Scholar] [CrossRef]
  4. Hagolle, O.; Huc, M.; Pascual, D.V.; Dedieu, G. A multi-temporal method for cloud detection, applied to FORMOSAT-2, VENμS, LANDSAT and SENTINEL-2 images. Remote Sens. Environ. 2010, 114, 1747–1755. [Google Scholar] [CrossRef] [Green Version]
  5. Muller-Wilm, U.; Louis, J.; Richter, R.; Gascon, F.; Niezette, M. Sentinel-2 level 2A prototype processor: Architecture, algorithms and first results. In Proceedings of the ESA Living Planet Symposium, Edinburgh, UK, 9–13 September 2013. [Google Scholar]
  6. Yan, L.; Roy, D.P.; Zhang, H.; Li, J.; Huang, H. An automated approach for sub-pixel registration of Landsat-8 Operational Land Imager (OLI) and Sentinel-2 Multi Spectral Instrument (MSI) imagery. Remote Sens. 2016, 8, 520. [Google Scholar] [CrossRef] [Green Version]
  7. Reddy, B.S.; Chatterji, B.N. An FFT-based technique for translation, rotation, and scale-invariant image registration. IEEE Trans. Image Process. 1996, 5, 1266–1271. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  8. Baraldi, A.; Tiede, D. AutoCloud+, a “Universal” Physical and Statistical Model-Based 2D Spatial Topology-Preserving Software for Cloud/Cloud–Shadow Detection in Multi-Sensor Single-Date Earth Observation Multi-Spectral Imagery—Part 1: Systematic ESA EO Level 2 Product Generation at the Ground Segment as Broad Context. Int. J. Geo-Inf. 2018, 7, 457. [Google Scholar] [CrossRef] [Green Version]
  9. Defourny, P.; Bontemps, S.; Bellemans, N.; Cara, C.; Dedieu, G.; Guzzonato, E.; Hagolle, O.; Inglada, J.; Nicola, L.; Savinaud, M.; et al. Near real-time agriculture monitoring at national scale at parcel resolution: Performance assessment of the Sen2-Agri automated system in various cropping systems around the world. Remote Sens. Environ. 2019, 221, 551–568. [Google Scholar] [CrossRef]
  10. Hollstein, A.; Segl, K.; Guanter, L.; Brell, M.; Enesco, M. Ready-to-Use Methods for the Detection of Clouds, Cirrus, Snow, Shadow, Water and Clear Sky Pixels in Sentinel-2 MSI Images. Remote Sens. 2016, 8, 666. [Google Scholar] [CrossRef] [Green Version]
  11. Paul, F.; Winsvold, S.H.; Kääb, A.; Nagler, T.; Schwaizer, G. Glacier remote sensing using Sentinel-2. Part II: Mapping glacier extents and surface facies, and comparison to Landsat 8. Remote Sens. 2016, 8, 575. [Google Scholar] [CrossRef] [Green Version]
  12. Du, Y.; Zhang, Y.; Ling, F.; Wang, Q.; Li, W.; Li, X. Water bodies’ mapping from Sentinel-2 imagery with modified normalized difference water index at 10-m spatial resolution produced by sharpening the SWIR band. Remote Sens. 2016, 8, 354. [Google Scholar] [CrossRef] [Green Version]
  13. Earth.esa.int. ACIX II—CMIX 2Nd WS. 2020. Available online: https://earth.esa.int/web/sppa/meetings-workshops/hosted-and-co-sponsored-meetings/acix-ii-cmix-2nd-ws (accessed on 5 June 2020).
  14. Baetens, L.; Desjardins, C.; Hagolle, O. Validation of Copernicus Sentinel-2 Cloud Masks Obtained from MAJA, Sen2Cor, and FMask Processors Using Reference Cloud Masks Generated with a Supervised Active Learning Procedure. Remote Sens. 2019, 11, 433. [Google Scholar] [CrossRef] [Green Version]
  15. Zhu, Z.; Wang, S.; Woodcock, C.E. Improvement and expansion of the Fmask algorithm: Cloud, cloud shadow, and snow detection for Landsats 4–7, 8, and Sentinel 2 images. Remote Sens. Environ. 2015, 159, 269–277. [Google Scholar] [CrossRef]
  16. Zhu, Z.; Woodcock, C.E. Object-based cloud and cloud shadow detection in Landsat imagery. Remote Sens. Environ. 2012, 118, 83–94. [Google Scholar] [CrossRef]
  17. Frantz, D.; Hass, E.; Uhl, A.; Stoffels, J.; Hill, J. Improvement of the Fmask algorithm for Sentinel-2 images: Separating clouds from bright surfaces based on parallax effects. Remote Sens. Environ. 2018, 215, 471–481. [Google Scholar] [CrossRef]
  18. Rufin, P.; Frantz, D.; Yan, L.; Hostert, P. Operational Coregistration of the Sentinel-2A/B Image Archive Using Multitemporal Landsat Spectral Averages. IEEE Geosci. Remote Sens. Lett. 2020, 1–5. [Google Scholar] [CrossRef]
  19. Frantz, D.; Röder, A.; Stellmes, M.; Hill, J. An Operational Radiometric Landsat Preprocessing Framework for Large-Area Time Series Applications. IEEE Trans. Geosci. Remote Sens. 2016, 54, 3928–3943. [Google Scholar] [CrossRef]
  20. Frantz, D.; Stellmes, M.; Röder, A.; Udelhoven, T.; Mader, S.; Hill, J. Improving the Spatial Resolution of Land Surface Phenology by Fusing Medium- and Coarse-Resolution Inputs. IEEE Trans. Geosci. Remote Sens. 2016, 54, 4153–4164. [Google Scholar] [CrossRef]
  21. Frantz, D. FORCE—Landsat + Sentinel-2 Analysis Ready Data and Beyond. Remote Sens. 2019, 11, 1124. [Google Scholar] [CrossRef] [Green Version]
  22. Richter, R.; Schläpfer, D. ATCOR Theoretical Background Document; DLR Report DLR-IB 564-03/2019; German Aerospace Center (DLR): Wessling, Germany, 2019; Available online: https://www.rese-apps.com/software/atcor/manual-papers.html (accessed on 15 January 2020).
  23. Louis, J. TO BE UPDATED Sentinel 2 MSI—Level 2A Product Definition. Issue 4.4. 2016-08-12. Available online: https://sentinel.esa.int/documents/247904/1848117/Sentinel-2-Level-2A-Product-Definition-Document.pdf (accessed on 15 January 2020).
  24. Hollmann, R.; Merchant, C.J.; Saunders, R.; Downy, C.; Buchwitz, M.; Cazenave, A.; Chuvieco, E.; Defourny, P.; de Leeuw, G.; Holzer-Popp, T.; et al. The ESA Climate Change Initiative. Satellite Data Records for Essential Climate Variables. Bull. Am. Meteorol. Soc. 2013, 94, 1541–1552. [Google Scholar] [CrossRef] [Green Version]
  25. Congalton, R.G. A review of assessing the accuracy of classifications of remotely sensed data. Remote Sens. Environ. 1991, 37, 35–46. [Google Scholar] [CrossRef]
  26. Planet DEM. Available online: https://www.planetobserver.com/products/planetdem/planetdem-30/ (accessed on 25 September 2018).
  27. Gascon, M.; Zijlema, W.; Vert, C.; White, M.; Nieuwenhuijsen, M. Outdoor blue spaces, human health and well-being: A systematic review of quantitative studies. Int. J. Hyg. Environ. Health 2017, 220, 1207–1221. [Google Scholar] [CrossRef] [PubMed]
  28. Giles, M. Foody Sample size determination for image classification accuracy assessment and comparison. Int. J. Remote Sens. 2009, 30, 5273–5291. [Google Scholar] [CrossRef]
  29. Olofsson, P.; Foody, G.M.; Herold, M.; Stehman, S.V.; Woodcock, C.E.; Wulder, M.A. Good practices for estimating area and assessing accuracy of land change. Remote Sens. Environ. 2014, 148, 42–57. [Google Scholar] [CrossRef]
  30. Stehman, S.V. Sampling designs for accuracy assessment of land cover. Int. J. Remote Sens. 2009, 30, 5243–5272. [Google Scholar] [CrossRef]
  31. Wagner, J.; Stehman, S. Optimizing sample size allocation to strata for estimating area and map accuracy. Remote Sens. Environ. 2015, 168, 126–133. [Google Scholar] [CrossRef]
  32. Oreopoulos, L.; Wilson, M.J.; Varnai, T. Implementation on Landsat Data of a Simple Cloud-Mask Algorithm Developed for MODIS Land Bands. IEEE GRSL 2009, 30, 5243–5272. [Google Scholar] [CrossRef] [Green Version]
  33. S2 MPC—Sen2Cor Configuration and User Manual. Ref.S2-PDGS-MPC-L2A-SUM-V2.8. Issue 2. 2019-02-05. Available online: http://step.esa.int/thirdparties/sen2cor/2.8.0/docs/S2-PDGS-MPC-L2A-SUM-V2.8.pdf2 (accessed on 14 February 2020).
Figure 1. Geographical distribution of 20 test sites selected for validation (orange squares).
Figure 1. Geographical distribution of 20 test sites selected for validation (orange squares).
Remotesensing 13 00137 g001
Figure 2. Schema for classification reference mask generation on example of Sen2Cor scene classification (SCL) product over Barrax test-site (Spain), acquired on 19 May 2017. This example represents various topography (flat and rough) and land-cover (vegetated, nonvegetated, water) as well as cloud cover dominated by the cumulus clouds. Process starting in left part with L1C channel combinations, continuing with the consolidation, stratified random sampling and visual labeling to create the reference image. Red circled image cubes: zoomed area.
Figure 2. Schema for classification reference mask generation on example of Sen2Cor scene classification (SCL) product over Barrax test-site (Spain), acquired on 19 May 2017. This example represents various topography (flat and rough) and land-cover (vegetated, nonvegetated, water) as well as cloud cover dominated by the cumulus clouds. Process starting in left part with L1C channel combinations, continuing with the consolidation, stratified random sampling and visual labeling to create the reference image. Red circled image cubes: zoomed area.
Remotesensing 13 00137 g002
Figure 3. Left to right: true color (RGB = 665,560,443 nm) composite of scene ID 19, SWIR1 (RGB = 1600,860,660 nm) composite and example spectra.
Figure 3. Left to right: true color (RGB = 665,560,443 nm) composite of scene ID 19, SWIR1 (RGB = 1600,860,660 nm) composite and example spectra.
Remotesensing 13 00137 g003
Figure 4. Difference area validation on example of scene 4 (Bolivia, Puerto Siles). Bottom row: Sentinel-2 Scene; top row: zoom of image showing a region with burned area; From left to right: Natural color composite of bands 2, 3, 4; false color composites of bands 8a, 12, 3 helpful for discrimination between dark classes, vegetation types and clouds; Classification map from Fmask; Classification output of ATCOR; Classification map from Sen2Cor; Difference area map.
Figure 4. Difference area validation on example of scene 4 (Bolivia, Puerto Siles). Bottom row: Sentinel-2 Scene; top row: zoom of image showing a region with burned area; From left to right: Natural color composite of bands 2, 3, 4; false color composites of bands 8a, 12, 3 helpful for discrimination between dark classes, vegetation types and clouds; Classification map from Fmask; Classification output of ATCOR; Classification map from Sen2Cor; Difference area map.
Remotesensing 13 00137 g004
Figure 5. Top row: true color (RGB = 665,560,443 nm) composite of scene ID 18 (Barrax-2). Bottom row (left to right): Fmask, ATCOR and Sen2Cor classification maps.
Figure 5. Top row: true color (RGB = 665,560,443 nm) composite of scene ID 18 (Barrax-2). Bottom row (left to right): Fmask, ATCOR and Sen2Cor classification maps.
Remotesensing 13 00137 g005
Figure 6. Top row (left to right): CIR (RGB = 865,665,560 nm) composite and CIR subset of scene ID 18 (Barrax-2). Bottom row (left to right): Fmask, ATCOR and Sen2Cor classification maps of the subset.
Figure 6. Top row (left to right): CIR (RGB = 865,665,560 nm) composite and CIR subset of scene ID 18 (Barrax-2). Bottom row (left to right): Fmask, ATCOR and Sen2Cor classification maps of the subset.
Remotesensing 13 00137 g006
Figure 7. Top row: SWIR1/NIR/red composite of scene ID 19 (Davos). Bottom row (left to right): Fmask, ATCOR and Sen2Cor classification maps.
Figure 7. Top row: SWIR1/NIR/red composite of scene ID 19 (Davos). Bottom row (left to right): Fmask, ATCOR and Sen2Cor classification maps.
Remotesensing 13 00137 g007
Figure 8. Top row (left to right): SWIR1/NIR/red composite and CIR (RGB = 865,665,560nm) subset of scene ID 19 (Davos). Bottom row (left to right): Fmask, ATCOR and Sen2Cor classification maps.
Figure 8. Top row (left to right): SWIR1/NIR/red composite and CIR (RGB = 865,665,560nm) subset of scene ID 19 (Davos). Bottom row (left to right): Fmask, ATCOR and Sen2Cor classification maps.
Remotesensing 13 00137 g008
Figure 9. Top row: CIR (RGB = 865,665,560 nm) composite of scene ID 20 (USA Rimrock). Bottom row (left to right): Fmask, ATCOR and Sen2Cor classification maps.
Figure 9. Top row: CIR (RGB = 865,665,560 nm) composite of scene ID 20 (USA Rimrock). Bottom row (left to right): Fmask, ATCOR and Sen2Cor classification maps.
Remotesensing 13 00137 g009
Figure 10. Top row (left to right): CIR (RGB = 865,665,560 nm) composite and true color (RGB = 665,560,443 nm) subset of scene ID 20 (USA Rimrock). Bottom row (left to right): Fmask, ATCOR, and Sen2Cor classification maps.
Figure 10. Top row (left to right): CIR (RGB = 865,665,560 nm) composite and true color (RGB = 665,560,443 nm) subset of scene ID 20 (USA Rimrock). Bottom row (left to right): Fmask, ATCOR, and Sen2Cor classification maps.
Remotesensing 13 00137 g010
Figure 11. Omission and commission per Class for difference area for clear classes clear land, water and snow.
Figure 11. Omission and commission per Class for difference area for clear classes clear land, water and snow.
Remotesensing 13 00137 g011
Figure 12. Omission and commission per Class for difference area for cloud classes cloud and semitransparent cloud.
Figure 12. Omission and commission per Class for difference area for cloud classes cloud and semitransparent cloud.
Remotesensing 13 00137 g012
Figure 13. Omission and commission per Class for difference area for shadow classes cloud shadows and topographic shadows.
Figure 13. Omission and commission per Class for difference area for shadow classes cloud shadows and topographic shadows.
Remotesensing 13 00137 g013
Table 1. Sentinel-2 spectral bands and spatial resolution.
Table 1. Sentinel-2 spectral bands and spatial resolution.
Sentinel-2 BandsResolution (m)
band 1 (0.433–0.453)60
band 2 (0.458–0.523)10
band 3 (0.543–0.578)10
band 4 (0.650–0.680)10
band 5 (0.698–0.713)20
band 6 (0.733–0.748)20
band 7 (0.765–0.785)20
band 8 (0.785–0.900)10
band 8a (0.855–0.875)20
band 9 (0.930–0.950)60
band 10 (1.365–1.385)60
band 11 (1.565–1.655)20
band 12 (2.100–2.280)20
Table 2. Sentinel-2 level L1C test scenes (SZA = Solar Zenith Angle). Information on scene cloud cover, climate, main surface cover, rural/urban.
Table 2. Sentinel-2 level L1C test scenes (SZA = Solar Zenith Angle). Information on scene cloud cover, climate, main surface cover, rural/urban.
SceneLocationDateTileSZACloud CoverDesertIce/SnowNonvegVegWaterMountainsRuralUrban
1Antarctic2019/02/03T21EVK54.9 28% XX X
2Argentina, Buenos Aires2018/08/27T21HUC51.5 0% X XX
3Australia, Lake Lefroy2018/08/19T51JUF51.5 0% X
4Bolivia, Puerto Siles2018/09/06T19LHF30.6 0% XX X
5China, Dunhuang2018/01/22T46TFK62.3 24%XX X
6Estonia, Tallin2018/07/14T35VLG39.0 2% X XX
7Germany, Berlin2018/05/04T33UUU38.0 1% XXX XX
8Italy, Etna2017/03/09T33UUU45.1 7% X X XX
9Kazakhstan, Balkhash2018/07/30T43TFM30.7 7%X X X
10Mexico, Cancun2018/05/27T16QDJ18.4 7% XX X
11Morocco, Quarzazate2018/08/30T29RPQ27.2 2%X XXX
12Mosambique, Maputo2018/07/13T36JVS54.4 0% X XX
13Netherlands, Amsterdam2018/09/13T31UFU49.7 5% XX XX
14Philipines, Manila2018/03/19T51PTS27.4 1% X XX
15Russia, Sachalin2018/05/09T54UVC35.5 0% XX X
16Russia, Yakutsk2017/08/08T52VEP45.9 6% XXX
17Spain, Barrax-12017/05/09T30SWH24.1 18% X XX
18Spain, Barrax-22017/05/19T30SWH22.0 2% X XX
19Switzerland, Davos2019/04/17T32TNS37.7 25% XXXXX
20USA, Rimrock2018/05/12T11TMM30.4 1% X X XX
Table 3. Consolidation of individual masking outputs to common mask labels. Reference mask and corresponding mask of selected codes.
Table 3. Consolidation of individual masking outputs to common mask labels. Reference mask and corresponding mask of selected codes.
LabelMasksDefinition for ReferenceFmaskATCORSen2Cor
1Clear landClear pixels over landClearClearVegetation; not vegetated; unclassified
2Semitransparent cloud0.01 < TOA rho (1.38 μ m) < 0.04; also haze, smoke or any kind of cloud which transparency enables to recognize the background featuresCloudSemitransparent cloudThin cirrus
3CloudCumulus cloud; thick clouds (also thin cirrus)CloudCloudCloud medium and high probability
4Cloud shadowShadow thrown by the clouds over landCloud shadowShadowCloud shadow
5Clear waterClear pixels over waterWaterWaterWater
6Clear snow/iceClear pixels over snow and iceSnow/IceSnow/iceSnow and ice
7Topographic shadowSelf-shadow and/or cast-shadows-Topographic shadowDark feature
0Background backgroundGeocoded backgroundNo data; saturated or defective
Table 4. Summary of classification accuracy (percent) for difference area   (F = Fmask, A = ATCOR, S = Sen2Cor; bold face numbers indicate the best performances).
Table 4. Summary of classification accuracy (percent) for difference area   (F = Fmask, A = ATCOR, S = Sen2Cor; bold face numbers indicate the best performances).
ClassUA (F)UA (A)UA (S)PA (F)PA (A)PA (S)
clear79.869.080.856.264.675.5
semitransp. cloud78.236.467.11.830.428.2
cloud13.439.334.684.562.765.7
cloud shadow50.853.382.342.849.127.7
water94.144.870.068.152.980.3
snow/ice53.058.949.525.275.785.7
topographic shadows75.943.014.62.24.153.0
Table 5. Summary of overall accuracy (percent)   (F = Fmask, A = ATCOR, S = Sen2Cor).
Table 5. Summary of overall accuracy (percent)   (F = Fmask, A = ATCOR, S = Sen2Cor).
OA Difference AreaOA Same AreaOA Total Area
SceneLocationFASFASFAS
IDAverage (all scenes)455662979797899192
1Antarctic3651569598100798688
2Argentina, Buenos Aires905959989898989696
3Australia, Lake Lefroy594967100100100969597
4Bolivia, Puerto Siles902258100100100999899
5China, Dunhuang5640749797100736485
6Estonia, Tallin407178989899909496
7Germany, Berlin577667100100100979898
8Italy, Etna327071100100100889595
9Kazakhstan, Balkhash477545929191879087
10Mexico, Cancun445966999999899293
11Morocco, Quarzazate718645100100100969893
12Mosambique, Maputo753545858585848383
13Netherlands, Amsterdam386364969696869091
14Phillipines, Manila466967989898929594
15Russia, Sachalin855163999898989394
16Russia, Yakutsk644953979797949293
17Spain, Barrax-1257462999999839391
18Spain, Barrax-2649290100100100979999
19Switzerland, Davos164246869797547274
20USA, Rimrock505263999999989998
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Zekoll, V.; Main-Knorn, M.; Alonso, K.; Louis, J.; Frantz, D.; Richter, R.; Pflug, B. Comparison of Masking Algorithms for Sentinel-2 Imagery. Remote Sens. 2021, 13, 137. https://0-doi-org.brum.beds.ac.uk/10.3390/rs13010137

AMA Style

Zekoll V, Main-Knorn M, Alonso K, Louis J, Frantz D, Richter R, Pflug B. Comparison of Masking Algorithms for Sentinel-2 Imagery. Remote Sensing. 2021; 13(1):137. https://0-doi-org.brum.beds.ac.uk/10.3390/rs13010137

Chicago/Turabian Style

Zekoll, Viktoria, Magdalena Main-Knorn, Kevin Alonso, Jerome Louis, David Frantz, Rudolf Richter, and Bringfried Pflug. 2021. "Comparison of Masking Algorithms for Sentinel-2 Imagery" Remote Sensing 13, no. 1: 137. https://0-doi-org.brum.beds.ac.uk/10.3390/rs13010137

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop