Shihmen Reservoir watershed is vital to the water supply in Northern Taiwan but the reservoir has been heavily impacted by sedimentation and soil erosion since 1964. The purpose of this study was to explore the capability of machine learning algorithms, such as decision tree and random forest, to predict soil erosion (sheet and rill erosion) depths in the Shihmen reservoir watershed. The accuracy of the models was evaluated using the RMSE
(Root Mean Squared Error), MAE
(Mean Absolute Error), and R2
. Moreover, the models were verified against the multiple regression analysis, which is commonly used in statistical analysis. The predictors of these models were 14 environmental factors which influence soil erosion, whereas the target was 550 erosion pins installed at 55 locations (on 55 slopes) and monitored over a period of approximately three years. The data sets for the models were separated into 70% for the training data and 30% for the testing data, using the simple random sampling and stratified random sampling methods. The results show that the random forest algorithm performed the best of the three methods. Moreover, the stratified random sampling method had better results among the two sampling methods, as anticipated. The average error (RMSE
relative to 1:1 line) of the stratified random sampling method of the random forest algorithm is 0.93 mm/yr in the training data and 1.75 mm/yr in the testing data, respectively. Finally, the random forest algorithm predicted that type of slope, slope direction, and sub-watershed are the three most important factors of the 14 environmental factors collected and used in this study for splits in the trees and thus they are the three most important factors affecting the depth of sheet and rill erosion in the Shihmen Reservoir watershed. The results of this study can be employed by decision-makers to improve soil conservation planning and watershed remediation.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited