The dataset used in the present study included 100 images with shaded areas gathered from LabelMe [
42]. Given an image under analysis, boundary pixels were selected through Canny method [
45] and for each pixel 36 features were evaluated through the proposed ADT filter (
Section 3). An overall of 389856 36-dimensional feature vectors were taken into account (194928 belonging to the shadow pixel class and 194928 to the non-shadow pixel class) and used as input to the developed MLP, AE, 1D-CNN, SVM classifiers to perform the 2-way pixel-based classification task: shadow vs. non-shadow. Standard metrics (i.e., Accuracy (A), Recall (R), Precision (P), F-measure (FM)) were employed to assess the effectiveness of the proposed classifiers:
where TP, FP, TN, FN represent the true positive, false positive, true negative and false negative, respectively. Furthermore, the
k-fold cross validation (with
k = 7) procedure was also applied for quantifying the discrimination performance. In particular, for each class, train set consisted of 70% of instances and test set of remaining 30%. Hence, all evaluation performance are reported as average value ± standard deviation. It is worth noting that the proposed MLP/SVM/CNN are supervised learning approaches and use the class label information in the training procedure. In contrast, AE is trained with unsupervised learning, hence the label was not used during the training phase. The extracted features from unlabeled data were the input to a softmax layer for classification purposes. The whole network is then fine-tuned to enhance performance.
Table 1 reports the pixel detection performance (evaluated on test sets) in terms of averaged precision, recall, F-measure, accuracy for the MLP, AE, 1D-CNN and SVM classifier. In relation to MLP classifiers, MLP
1 outperformed MLP
2 and MLP
3, achieving F-measure and accuracy values of 85.05 ± 0.57% and 84.63 ± 0.63% respectively. However, it is to be noted that high performance were observed also with MLP
2 (F-measure of 82.39 ± 0.69%, accuracy of 81.71 ± 0.77%) and MLP
3 (F-measure of 84.55 ± 0.53%, accuracy of 84.19 ± 0.56%). In relation to AE classifiers, the model with two hidden layer denoted as AE
3, produced F-measure and accuracy rates up to 78.84 ± 0.66% and 77.91 ± 0.72%, respectively. Very good results were achieved also by AE
1 and AE
2. In particular, the average accuracies were of 76.51 ± 1.04% and 77.51 ± 1.89%, respectively. As regards the proposed 1D-CNN (
Figure 9) the following average perfomance were achieved: accuracy of 75.8 ± 1.2%, precision of 73.5 ± 1.91%, recall of 81.0 ± 2.3% and F-measure of 76.9 ± 1%. Finally, as regards SVM classifier, lower average performance were achieved: precision of 62.5 ± 3.88%, recall of 58.6 ± 9.64%, F-measure 59.93 ± 3.85%, accuracy of 61.27 ± 2.18%. Hence, comparative simulation results showed that the proposed MLP
1 classifier achieved the highest pixel-based detection performance (accuracy of 84.63 ± 0.63%) as compared to MLP
2, MLP
3, AE
1, AE
2, AE
3, SVM classifiers. In support of this result, the Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) was estimated. Specifically,
Figure 10 shows the average values of AUC and related ROC curves, evaluated for the developed MLP, AE, 1D-CNN, SVM classifiers. As can be seen, MLP
1 outperformed all the other approaches reporting an AUC = 92 ± 0.53%. It is worth noting that, overtraining and overfitting issues were also studied to assess the effectiveness of the developed models. Specifically, in order to control the aforementioned phenomenon, the
k-fold cross validation (
k = 7) technique was adopted and train accuracies were also compared with those achieved in the test phase. Note that the following data separation was employed: 70% for train and 30% for test. As can be seen from
Figure 11, all the proposed models were not deficient of overtraining or overfitting and provided good generalization abilities, reporting a small standard deviation and a maximum gap between train and test average accuracy of only 1%. Furthermore, networks were trained until the converge of the cross-entropy function was observed. As an example,
Figure 12, reports the training phase of the best classifier proposed in this study, i.e., MLP
1. However, it is also worth mentioning that the topology of the classifiers, learning and training parameters were set-up performing several experimental tests according to a
trial and error approach.
Hence, experimental results showed that the proposed MLP
1 classifier achieved the best detection performance when compared with other machine learning techniques (i.e., MLP
2, MLP
3, AE
1, AE
2, AE
3, 1D-CNN, SVM) as well as other previous shadow detection approaches [
9,
11], reporting accuracy and AUC rates up to 84.63 ± 0.63% and 89 ± 0.8%, respectively. It is interesting to note that MLP outperforms AE and 1D-CNN classifiers (that belongs to more advanced machine learning architectures [
65]). This is possible due to the limited size of input space (only 36 features). Indeed, DL-based approaches are typically used to big data with a large input dimension. Here, the proposed AE may cause an over compression of features and hence a loss of significant information; while the hierarchical learning representation of a standard CNN is too complex to cope with the pixel-based classification task of this study. For these reasons, common machine learning algorithms with a simpler architecture (i.e., MLP) achieved better results. It is worth noting that also the developed AE and 1D-CNN achieved very good results. In particular, AE
3 reported detection accuracy of 77.91 ± 0.72%, whereas 1D-CNN reported accuracy of 75.8 ± 1.2%.