High-Resolution Digital Soil Maps of Forest Soil Nitrogen across South Korea Using Three Machine Learning Algorithms

An, Yoosoon; Shim, Woojin; Jeong, Gwanyong

doi:10.3390/f14061141

Open AccessArticle

High-Resolution Digital Soil Maps of Forest Soil Nitrogen across South Korea Using Three Machine Learning Algorithms

by

Yoosoon An

^1,2,3,4,

Woojin Shim

^2,3,*

and

Gwanyong Jeong

^3,*

¹

Institute for Korean Regional Studies, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, Republic of Korea

²

Seoul National University Asia Center (SNUAC), Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, Republic of Korea

³

Department of Geography, College of Social Science, Chonnam National University, 77 Youngbong-ro, Buk-gu, Gwangju 61186, Republic of Korea

⁴

Department of Geography, College of Social Science, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, Republic of Korea

^*

Authors to whom correspondence should be addressed.

Forests 2023, 14(6), 1141; https://0-doi-org.brum.beds.ac.uk/10.3390/f14061141

Submission received: 6 April 2023 / Revised: 24 May 2023 / Accepted: 30 May 2023 / Published: 31 May 2023

(This article belongs to the Section Forest Soil)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Reliable estimation of the forest soil nitrogen spatial distribution is necessary for effective forest ecosystem management. This study aimed to develop high-resolution digital soil maps of forest soil nitrogen across South Korea using three powerful machine learning methods to better understand the spatial variations of forest soil nitrogen and its environmental drivers. To achieve this, the study used national-level forest soil nitrogen data and environmental data to construct various geographic and environmental variables including geological, topographic, and vegetation factors for digital soil mapping. The results show that of the machine learning methods, the random forest model had the best performance at predicting total soil nitrogen in the A and B horizons, closely followed by the extreme gradient-boosting model. The most critical predictors were found to be geographic variables, quantitatively confirming the significant role of spatial autocorrelation in predicting soil nitrogen. The digital soil maps revealed that areas with high elevation, concave slopes, and deciduous forests had high nitrogen contents. This finding highlights the potential usefulness of digital soil maps in supporting forest management decision-making and identifying the environmental drivers of forest soil nitrogen distribution.

Keywords:

forest soil nitrogen; machine learning; geographic variable; digital soil mapping; spatial autocorrelation

1. Introduction

Nitrogen is one of the most important nutrients for the primary productivity of forest ecosystems [1]. During plant growth, nitrogen is required in large amounts, and when nitrogen availability becomes limited, it can affect the growth and productivity of plants, leading to reduced forest biomass and changes in species composition. This, in turn, can have cascading effects on forest-dependent organisms. Thus, nitrogen limitation can affect forest productivity, biodiversity, and community composition [2,3]. Soils in forest ecosystems contain large quantities of nitrogen [4], in a shallow layer supporting uncountable flora. Therefore, understanding the dynamics of soil nitrogen in forest ecosystems is crucial in maintaining forest health and productivity.

Reliable estimation of the forest soil nitrogen spatial distribution and site characteristics such as soil nitrogen availability is essential for effective forest ecosystem management [5]. Most forests in South Korea are located in mountainous areas, which present a heterogeneous environment [6], where the spatial variability of the forest soil is relatively high. Information on the spatial distribution of forest soils can be obtained from existing forest soil maps. However, polygon-based soil maps cannot describe the continuous local variations of forest soil properties. Additionally, updating forest soil maps based on past field works is expensive, and time-consuming, requiring many soil mappers.

A convenient and low-cost “predictive soil mapping” or “digital soil mapping (DSM)” methodology, i.e., the development of a numerical or statistical model of the relationship between soil properties and environmental predictors, is desirable. A key advantage of DSM is that the characteristics of soils at “unknown” locations can be predicted quantitatively onto a digital soil map, albeit with uncertainty, using the relationship between soil and environmental predictors at “known” locations under similar environmental conditions. Digital soil mapping has been introduced to estimate the distribution of soil properties across local, regional, and large-scale areas. This advanced tool could allow forest ecologists to understand spatial forest soil variations with greater insight.

Natural ecological processes influence the spatial distribution of forest soil nitrogen. The relationship between the nitrogen content of forest soils and the surrounding environmental variables is complex and dynamic, with multiple factors influencing its availability and its cycle. One important environmental variable that can affect soil nitrogen content is climate. A study by Giardina et al. [7] found that nitrogen availability in tropical forest soils was strongly influenced by temperature and precipitation, with warmer and wetter conditions leading to higher rates of nitrogen mineralization and uptake. Vegetation also affects forest soil nitrogen contents. Nave et al. [8] studied the changes to nitrogen content that occur during forest succession. In addition to climate and vegetation, other environmental variables, such as land use history, topography, and plant community composition, can also impact soil nitrogen contents. Zhu et al. [9] found that land use changes from forest to agricultural or forest to urban areas resulted in significant decreases in soil nitrogen contents. Organic matter, in particular, is directly related to forest soil nitrogen content.

In DSM, a number of environmental variables have been employed to map soil properties in forest ecosystems. Vegetation indices derived from satellite data have been used to assess spatial patterns of forest soil nitrogen. The spatial distribution of nitrogen in forest soils is also related to the topographical position, where soil nitrogen content tends to increase downslope [10]. Indeed, a lot of research has tried to explain the spatial variability of soil nitrogen using topographical variables. Taken together, it is clear that in order to understand the distribution of nitrogen in forest soils, it is crucial to consider a variety of environmental factors, such as geology, topography, and vegetation. Examining the importance of these factors can give deeper insights into the intricate workings of the nitrogen cycle in the forest ecosystem.

A variety of DSM techniques have been used to predict soil nitrogen distribution. Machine learning methods, such as support vector regression [11], random forest [12], and boosted regression trees [13], have shown especially good performance in forest areas with complex environments. Despite this, we still lack the knowledge to select the best models for predicting forest soil nutrients. In particular, there have been few studies using deep learning techniques for predicting soil nutrients in forest ecosystems. Therefore, evaluating the potential of deep learning and comparing it with other machine learning techniques would be worthwhile.

Many studies have used regression kriging as a DSM tool to account for soil spatial autocorrelation [14]. However, a novel approach has emerged that integrates the spatial autocorrelation of soil into the prediction model. This method utilizes geographic explanatory variables, producing results similar to those obtained through the regression kriging [15,16]. Geographic variables indicate the proximity or connectivity between the soil samples being predicted and can be employed as explanatory variables. With the incorporation of these geographic variables, spatial autocorrelation between soil samples can be taken into account, even when non-spatial prediction techniques such as machine learning are used, possibly enhancing the accuracy of soil nitrogen prediction models.

Access to comprehensive soil information at a national scale could be highly beneficial in supporting decision-making regarding forest management. Despite the need for such information, there have been few national digital soil maps created specifically for forest soil nitrogen [11]. Previous studies on forest soil nitrogen utilizing DSM methods were conducted mainly on the local and regional scales, with spatial ranges of approximately 2–3000 km² [5,17,18,19]. To address this gap, this study aims to develop high-resolution digital soil maps of forest soil nitrogen at a national level in South Korea with a view to revealing environmental drivers that contribute to the distribution of forest soil nitrogen. The primary objectives were to (1) predict forest soil nitrogen contents in the A and B horizons using three powerful machine learning methods, including Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Deep Neural Networks (DNNs); (2) explore the significant role played by environmental and geographic variables in determining variations in total soil nitrogen; and (3) describe and interpret spatial variations of forest soil nitrogen in relation to important environmental variables.

2. Materials and Methods

2.1. Environmental Information of Research Area

South Korea, located on the southern half of the Korean Peninsula, experiences a climate influenced by both the Asian monsoon and continental climate, with cooler temperatures in the northern inland areas and milder conditions in the southern and coastal areas. Precipitation is typically above 1000 mm annually in most regions and rarely less than 750 mm. The Korean Peninsula is characterized by mountains and hills, with the Baekdudaegan mountain range traditionally regarded as the backbone of the peninsula (Figure 1). This range starts from Mt. Baekdu and ends at Mt. Jiri and is an important ecological axis and a key forest area on the Korean Peninsula. South Korea has complex and diverse geology, with metamorphic rocks concentrated in the northwest region, sedimentary rocks mainly found in the southeast region, and limestone distributed between these two regions. Igneous rocks, particularly granite, are scattered across the country, while volcanic rocks are found in Jeju-do and northwest Gangwon-do. The distribution of limestone and basalt geology is significant for soil formation in South Korea. Forests cover 64.2% of the Korean Peninsula, with the forest types mainly composed of coniferous forests (36.9%, 2020), broad-leaved forests (31.8%), and mixed forests (26.45%). Over time, coniferous forests have decreased while broad-leaved forests have increased, and mixed forests have remained stable.

South Korea’s forest soil classification system is unique and incompatible with international standards. The most common types of forest soils in South Korea are brown forest soils, dark red forest soils, and volcanic ash forest soils. Brown forest soils are formed in the general geology of South Korea, such as gneiss and granite. These soils can be classified into Cambisols (mountain tops or ridges), Umbrisols (backslopes), and Acrisols (low-altitude forests) in the FAO/WRB system [20]. Dark red forest soils are formed in limestone geological environments, typically classified as Luvisol [20]. Volcanic ash forest soils are formed in volcanic rocks such as basalt, which is similar to Andosol in the international soil class [20]. However, many of these soils could be classified as immature soil, such as Inceptisols or Entisols in the USDA soil taxonomy, due to South Korea’s rapid afforestation [21]. Consequently, converting South Korea’s forest soil classification system to the international one is challenging, requiring further research to clarify the compatibility of South Korea’s forest soil classification. Based on the data obtained from the forest soil survey (refer to Section 2.2.1), the mean depth of the A horizon in South Korea’s forest soil is 8.31 cm, with a corresponding standard deviation of 7.48 cm. The mean depth and standard deviation of the B horizon in the forest soil are 44.79 cm and 18.45 cm, respectively.

2.2. Materials for Digital Soil Mapping

In order to conduct this study’s DSM, spatial data on forest soil nitrogen was obtained by analyzing soil samples and taking soil environmental and geographic variables into account.

2.2.1. Geospatial Data on Forest Soil Nitrogen

The total nitrogen contents in forest soil data were generated by combining the soil sample analysis data from the 1:5000 national forest soil map and the 7th National Forest Inventory and Forest Health Monitoring (NFI7‧FHM) survey.

The national forest soil map, which provides information on the total nitrogen content in the A and B horizons, includes data from field surveys and experiments conducted from 2009 to 2021. The soil sample analysis data used in this study were composed of 5457 samples from the A horizon and 6072 samples from the B horizon (Figure 2).

The NFI7·FHM data are a survey conducted on a nationwide grid of sample points, evenly distributed throughout the country, to assess changes in forest resources in the health of forest ecosystems. Details of this data can be found in the work of Lee et al. [22].

The NFI7·FHM soil properties data from 2016 to 2020 include soil texture (sand, silt, and clay content), organic carbon, acidity (pH), total nitrogen, and available phosphorus. To determine the soil nitrogen content, soil samples were collected and analyzed at 10 cm depth intervals between 0 and 30 cm. In this study, the nitrogen content from the 0–10 cm depth was used as the A horizon data. A total of 968 samples from the NFI7·FHM data were used.

Figure 2 illustrates the spatial distribution of total nitrogen contents in the forest soil for both the A and B horizons. The soil data in both horizons were somewhat spatially biased, but the A horizon data were relatively more uniform as it integrated NFI data collected from standard, evenly distributed points. It was observed that the areas with higher elevations and those located in the northeastern regions of South Korea tended to have higher soil nitrogen contents in both horizons.

2.2.2. Environmental and Geographic Variables

In DSM, it is crucial to identify and utilize the environmental correlations with soil properties. To accomplish this, the environmental and geographic variables suitable for South Korean forests were identified and constructed. The soil varies spatially and temporally depending on soil formation factors, such as climate, parent material, relief (topography), organisms, and time, so a variety of environmental variables are necessary.

Parent material variables

Parent material data were obtained from the digital geological map (1:50,000) provided by the Korea Institute of Geoscience and Mineral Resources and converted into variables. The rock types were classified into 16 lithological units (Table 1). Since parent material is a categorical variable, it was transformed into dummy variables to be incorporated into the model.

Topographic variables

The topographic variables used in this study were derived from a digital elevation model (DEM) produced by the National Geographic Information Institute, which originally had a resolution of 10 m. However, the model was resampled to a resolution of 30 m to reduce errors that may have been introduced during the DEM construction process, as recommended by previous studies [23]. From this, the topographic variables (Table 2) were constructed using SAGA GIS [24]. The hillslope position variables were obtained using a hillslope position classification technique based on the concept of catena proposed by Shim and Park [25]. This method classifies the hillslope position based on the upslope contributing area, which reflects the soil formation process and the movement of influencing substances (Figure 3).

Vegetation variables

The vegetation variables used in this study were obtained from the Korean Forest Service’s forest map (1:5000). The forest map includes information on forest type, age class, diameter class, and crown density. In this study, these data were transformed into dummy variables for use in the analysis (Table 3).

Geographic variables

To account for spatial autocorrelation in the forest soil information, geographic variables were used as per Behrens et al. [15]. The coordinates (X, Y) of the forest soil information and the Euclidean distances from five reference points—center, upper left, lower left, upper right, and lower right—were calculated and incorporated into the predictive model (Table 3).

2.3. Digital Soil Mapping Methods

The advancement of artificial intelligence has led to the widespread use of machine learning techniques such as support vector regression and back propagation neural network to predict soil properties. In this study, RF, XGBoost, and DNN techniques were utilized to predict the distribution of forest soil nitrogen.

2.3.1. Random Forest

Tree-based algorithms are commonly used in the DSM literature because they consist of nodes and leaves that partition the training dataset to maximize within-node homogeneity and between-node heterogeneity. The RF algorithm is based on decision trees proposed by Breiman [26] that combines multiple decision trees (learners) with different structures and performance to create an ensemble model. The RF method can represent the complex relationship between environmental factors and soil properties well and is widely used in the field of DSM [27].

The “caret” and “ranger” packages in R were used for RF model tuning. For RF modeling, there are three parameters, including the number of regression trees (ntree), minimum size of terminal nodes (min_node_size), and the number of randomly selected predictors (mtry). A grid search was used to evaluate the combinations of ntree (500), min_node_size (5), and mtry (21, 33, and 64) through the “caret” package.

2.3.2. Extreme Gradient Boosting

The XGBoost method is based on a tree-based algorithm used for regression and classification predictions. It utilizes a gradient-boosting framework to create a strong learner from multiple weak learners and employs many decision trees to make predictions. What sets XGBoost apart is the way it constructs new decision trees based on the prediction errors of the previous tree model, with the aim of minimizing the prediction error of the final prediction. This results in a final prediction that is an ensemble of multiple decision trees. The XGBoost algorithm is highly effective and widely used due to its ability to work with a weak set of predictions as an ensemble model [28,29].

The XGBoost method allows for model utilization tailored to the characteristics of the data and the purpose of the model through parameter tuning. The parameters can be broadly divided into three categories: General parameters can set the overall functionality of the model, including the type of booster and the number of threads used for parallel processing. Booster parameters control the settings of the tree boosters used in the model, allowing for control of the learning rate, tree depth, number of nodes per tree, and other aspects of the tree’s structure. Learning task parameters can be set to optimize the model’s performance, including the objective function and various settings depending on the data structure and the desired output variable to be extracted.

The R package “caret” was used for XGBoost model tuning. The fitting of the XGBoost model was controlled by seven parameters, including the learning rate (eta); maximum tree depth (max_depth); minimum loss reduction (gamma); subsample ratio of columns for each tree (colsample_bytree); restriction of the number of splits of each tree (min_child_weight); subsample ratio of the training instances, i.e., rows, to grow trees (subsample); and the number of trees and iterations (nrounds). These parameters were tested based on different combinations of eta (0.3 and 0.4), max_depth (1, 2, and 3), gamma (0), colsample_bytree (0.6 and 0.8), min_child_weight (1), subsample (0.5, 0.75, and 1), and nrounds (50, 100, and 150) via the “caret” package.

2.3.3. Deep Neural Networks

Deep Neural Networks were developed based on shallow neural networks, such as the Multilayer Perceptron [30]. Shallow neural networks typically only have one hidden layer. In contrast, DNN models often utilize multiple hidden layers between the input and output layers, giving them the ability to learn more complex non-linear relationships. This makes DNNs highly effective in handling complicated data and patterns. Some DSM studies have shown more accurate prediction results using DNNs than when using the RF method, a representative machine learning model, and DNNs show high potential for soil prediction [31,32,33].

This study constructed 18 models by training with a DNN architecture consisting of 3 hidden layers between the input and output layers. Among the model parameters, the number of units in the first and third hidden layers (3 options—32, 64, and 96) and the number of units in the second hidden layer (3 options—96, 128, and 256) were varied along with dropout (2 options—0.1 and 0.3) during training. The best model was selected from these 18 models. The rectified linear unit (ReLU) was used as the activation function to consider non-linearity. The batch size, which is the small sample size in which the entire data are divided to create small pieces for processing in the DNN model, was set to 128, and the number of iterations performed by the batch size, called the epoch, was set to 300. The RMSProp algorithm was used as the optimizer for model optimization, and the learning rate was set to the default value of 0.001. The “keras” and “tftuns” packages in R were used for DNN model analysis and training.

2.3.4. Model Validation and Uncertainty

To evaluate and train the models, 70% of the soil samples was used as training data and 30% as validation data. In the training data, three repetitions and 2-fold cross-validation were performed for the model tuning. Independent (external) validation was conducted using the 30% validation data. The RMSE and R² statistics were compared to assess model performance. To quantify the spatial prediction uncertainty, the soil training data were randomly split into two datasets using the resampling method. The first dataset, which accounted for 70% of the training data, was used to calibrate the model parameters by 3 repeated 2-fold cross-validation. This process was repeated 10 times, generating 10 calibrated models. The standard deviation of the cell-wise predictions from the 10 models can be used to estimate the spatial uncertainty.

3. Results

3.1. Description of Soil Nitrogen Data

The majority of the soil nitrogen data in the A and B horizons exhibited values within a range of 0%–1% (Figure 4), with only a few instances where they extended up to 4.8% (A horizon) or 2.6% (B horizon). The mean total nitrogen content of the A horizon was 0.22%, while that of the B horizon was 0.09%. The standard deviation was either similar to the mean for the A horizon or greater than the mean for the B horizon. Consequently, the coefficient of variation of the A horizon was 80.21%, while that of the B horizon was 125.88%. This indicated that the total nitrogen content values in mountainous soils on the Korean Peninsula exhibit high variance. Moreover, the results of the t-test indicate that there was a statistically significant difference in the distribution and variability of soil nitrogen between the A horizon and the B horizon (Figure 5).

3.2. Selecting the Optimal Prediction Machine Learning Methods

Table 4 displays the results of the accuracy evaluation of three machine learning methodologies used to predict the forest soil nitrogen distributions. The RMSE value for the A horizon ranged from 0.11 to 0.12, and the R² value was between 0.13 and 0.27. The RF algorithm demonstrated the best performance, followed by XGBoost, while DNN produced relatively poor results. It is worth noting that the RF and XGBoost methods produced similar results. Concerning the B horizon, the RMSE was between 0.054 and 0.057, and the R² value ranged from 0.37 to 0.45. The RF algorithm outperformed the others, with results comparable to those of the A horizon. Thus, there was little discernible difference between the performance of the RF and XGBoost methods, but the DNN model produced lower accuracy for both A and B horizon data.

As the RF method demonstrated superior performance in predicting the soil nitrogen contents of both horizons, a variable importance analysis was conducted for each horizon to identify the key variables used in the RF algorithm (Figure 6 and Figure 7). For the A horizon, the most important predictors were found to be geographic variables, such as coords.x2, DISTC4, and DISTC1. Elevation (ELEV) showed relatively high variable importance, followed by terrain openness (POPEN) and curvature (TPI05). Furthermore, the nitrogen content in the A horizon was affected by factors such as andesite as the parent material (GTYPE4), the broad-leaved forest type (FTYPE2), and the toeslope topographical position (CATENA5).

Similarly, the variable importance analysis revealed that geographic variables, such as coords.x2, coords.x1, and DISTC2, played a crucial role in predicting the soil nitrogen content in the B horizon (Figure 6). In addition, topographic variables such as terrain openness (POPEN), elevation (ELEV), and slope (SLOPE) were found to be critical predictors. Like the A horizon, the nitrogen content in the B horizon was associated with the broadleaved forest type (FTYPE2), but the coniferous forest type (FTYPE1) was also an important predictor. However, unlike the A horizon, the presence or absence of granite (GTYPE1) as the parent material was a key predictor of the nitrogen content in the B horizon.

3.3. Digital Soil Maps for Total Nitrogen in Forest Soils

Figure 8 displays the digital maps of total forest soil nitrogen for the A and B horizons, respectively, for South Korea. The nitrogen content in the A horizon exhibited high levels in mountainous areas, including the Baekdu Mountain Range and Yeongnam Alps in the eastern part of the Korean peninsula, the Jirisan Mountains in the southern part, and the Hallasan Mountains on Jeju Island. Additionally, areas, such as Inje-gun, Hongcheon-gun, Pyeongchang-gun, Yeongwol-gun, Taebaek-si, Bonghwa-gun, Cheongdo-gun, Gyeongju-si, MilYang-si, Yangsan-si, Ulju-gun, and Jeju Island, had high nitrogen contents in the A horizon. Elevation and convex slopes were also correlated with high nitrogen contents, with some exceptions such as in the Yeongnam Alps, Jirisan Mountains, and Jeju Island, where high nitrogen content was observed throughout the slope. The distribution of soil nitrogen content in the A horizon was generally greater in areas with andesite and high-elevation deciduous forests.

Similarly, the distribution of soil nitrogen content in the B horizon showed high nitrogen contents in mountainous areas but also was associated with areas, where granite parent material is present. Notably, there were high soil nitrogen contents in the southern part of Gangwon Province and the northern parts of Chungcheongbuk-do and Gyeongsangbuk-do, particularly along their boundary. This area included Pyeongchang-gun, Jeongseon-gun, Yeongwol-gun, Danyang-gun, Jecheon-si, Chungju-si, Goseong-gun, and Mungyeong-si. Jeju Island also had high nitrogen content in its B horizon soils. The variable importance analysis suggested that deciduous and coniferous forests have contributed significantly to the distribution of soil nitrogen contents in the B horizon, reflecting microspatial differences.

In Figure S3, the standard deviation of nitrogen contents in the A and B horizons is presented. It was observed that nitrogen content showed a higher standard deviation along the high ridges, the Baekdudaegan mountain range. The higher uncertainty for nitrogen was found in areas with higher nitrogen content values due to the soil data structure. When the soil data include a small proportion of extreme values, it becomes skewed, resulting in data sparsity and increased spatial uncertainty. Uncertainty can arise from data deficiencies, such as missing covariates and small or biased samples [34].

4. Discussion

Table 5 presents the total nitrogen content (%) in soil data from 2009 to 2021, along with the corresponding data on the nitrogen content of South Korean forest soils from the 1984 to 1990 period. By comparing the current research data with the previously gathered data, we can validate the reliability of the research findings and observe changes in the nitrogen content of forest soil on the Korean Peninsula over the past two decades.

For the A horizon, the average total nitrogen content of the research data was slightly higher than that of previous data, and the variability was also marginally greater. On the other hand, for the B horizon, the average nitrogen content of the research data was the same as the previous data, but the variability was relatively higher. This result demonstrated the consistency between past surveys and current research data. Moreover, the findings suggested that the total nitrogen content of the South Korean forest soils’ surface (A) horizon has slightly increased over the past 20 years, indicating an improvement in ecological forest health. However, due to the high variance, this trend represents a relatively significant difference from period to period, necessitating more precise verification. The intensification of agricultural practices and the growth of large industrial operations have resulted in a significant increase in atmospheric nitrogen inputs in South Korea. Specifically, between 2005 and 2010, the annual average wet input of nitrogen ranged from 12.9 to 24.9 kg/ha-year, which is substantially higher than pre-industrial levels [36]. While this trend is decreasing recently [37], it is possible that the input of atmospheric nitrogen enrichment during this period has led to an increase in soil nitrogen in forest ecosystems. Further research is necessary to obtain a better understanding of the changes in soil nitrogen.

The RF method showed the best performance considering its RMSE and R² value, while the DNN method showed rather worse results, based on its RMSE. The RF and XGBoost methods were better models than the DNN in our results. Joharestani et al. [38] employed RF, XGBoost, and DNN algorithms to predict PM2.5 concentration, with XGBoost exhibiting the best performance and a reasonable training time. However, they found that all three machine learning techniques had similar performances. A study conducted by Kampichler et al. [39] compared five machine learning techniques based on modeling effort, modeling performance, method intricacy, and classifier comprehensibility. The authors suggested using RF due to its strong modeling performance and the simplicity of constructing and interpreting models. Our findings also supported the use of RF as a reliable method for predicting forest soil nitrogen, as it produced accurate model results. In terms of ease of use, the RF model can outperform the XGBoost due to a need for fewer parameter adjustments.

The results assigned high importance to several variables for predicting forest soil nitrogen in the A horizon (in descending order): geographic variables (coords.x2, DISTC4, and DISTC1), elevation (ELEV), topographic openness (POPEN), and curvature (TPI05). Thus, the importance of spatial autocorrelation in forest soil nitrogen prediction was quantitatively confirmed by including the geographic variables in our models. The universal model of soil variation proposes that complex spatial variation is composed of three primary components: (i) the deterministic component (a function of environmental variables), (ii) the spatially correlated component (a function of geographic variables), and (iii) pure noise (measurement error and short-range spatial variation) [40]. However, most machine learning algorithms take a non-spatial approach, ignoring the spatial locations of observations and thus failing to account for any spatially correlated components not captured by the environmental variables. This could result in biased predictions, even leading to systematic over- or under-prediction, particularly when the target variable exhibits a high spatial autocorrelation [16]. This study confirms that incorporating geographic variables into forest soil prediction models can potentially enhance their accuracy. Geographic variables could also be convenient for users because they can be directly used in machine learning models [41].

Higher values of soil nitrogen content were found at higher altitudes in this study. Jeong et al. [6] also found that forest soil nitrogen in the A horizon was correlated with elevation. Additionally, positive relationships between soil nitrogen and elevation were reported by Kunkel et al. [18], Peng et al. [19], and Wang et al. [42]. Elevation might be an indicator of soil temperatures and decomposition rates that result in nitrogen accumulations [43]. The high-altitude areas of South Korea are mostly forest lands partly designated as natural reserves, national parks, or military zones. In these areas, high vegetation density might indicate low levels of disturbance.

This study found a strong spatial autocorrelation in the spatial distribution of soil nitrogen contents in both horizons, and terrain openness and curvature variables were important predictors in both horizons too. Specifically, concave areas were predicted to have high soil nitrogen contents. Slope configuration affects the lateral movement of water and nutrients, which results in high spatial heterogeneity of soil [44,45]. In the research areas composed of deciduous forests, the nitrogen contents were generally higher than in other areas. A number of studies have found significant relationships between soil nitrogen and the normalized difference vegetation index, which are measures of aboveground biomass not used in this study. Areas dominated by forests with higher vegetation density and deciduous trees might have higher soil productivity and higher nitrogen fixation [46].

5. Conclusions

Forest ecosystems rely heavily on the availability of nitrogen in the soil, and having reliable estimates of its spatial distribution can be crucial for effective forest ecosystem management. Digital soil mapping technology provides a convenient and cost-effective method for predicting forest soil nutrients. However, there have been few instances of national digital soil maps developed specifically for forest soil nitrogen. Therefore, this study aimed to develop high-resolution digital soil maps of forest soil nitrogen across South Korea to better comprehend the spatial variation of forest soil nitrogen and its environmental drivers. To achieve this, we constructed more than 20 independent variables by examining the effects of environmental and geographic factors that determine changes in the total soil nitrogen content. We then used three machine learning models (RF, XGBoost, and DNN) to predict the nitrogen content in horizons A and B.

The total soil nitrogen content varied greatly in different regions, with higher levels and lower variability in the A horizon compared to the B horizon. The RF method performed the best in predicting nitrogen content in both horizons, followed by the XGBoost. The prediction of soil nitrogen content in the A horizon was strongly associated with various geographic variables. Our results indicated a strong spatial autocorrelation in the forest soil nitrogen data, and elevation, topographic openness, and curvature (TPI05) were significant predictors. For the B horizon, the key variables were slightly different but followed a similar overall trend. Geographic variables, along with topographic variables, were the most important predictors. Environmental variables related to deciduous and coniferous forest types and granite parent material also played a critical role in the B horizon prediction model.

The study produced digital soil maps of forest soil nitrogen for the A and B horizons, revealing a high soil nitrogen content in the northeastern mountainous areas with high elevations and concave slopes. Moreover, areas with deciduous forests showed high levels of soil nitrogen content in both horizons. The results of this study suggest that soil information based on digital soil maps would be useful in supporting forest management decision-making and could also provide insights into the environmental factors controlling forest soil nitrogen distribution.

Supplementary Materials

The following supporting information can be downloaded at https://0-www-mdpi-com.brum.beds.ac.uk/article/10.3390/f14061141/s1, Figure S1: Extreme Gradient Boosting (XGBoost)-based digital soil map of total nitrogen in the A and B horizons; Figure S2: Deep Neural Networks (DNNs)-based digital soil map of total nitrogen in the A and B horizons; Figure S3: The standard deviation to estimate the spatial uncertainty (A and B horizons).

Author Contributions

Conceptualization, Y.A., W.S. and G.J.; methodology, G.J.; software, Y.A., W.S. and G.J.; validation, G.J.; formal analysis, G.J.; investigation, G.J.; resources, Y.A., W.S. and G.J.; data curation, G.J.; writing—original draft preparation, Y.A., W.S. and G.J.; writing—review and editing, Y.A., W.S. and G.J.; visualization, Y.A., W.S. and G.J.; supervision, G.J.; project administration, G.J.; funding acquisition, G.J. and W.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (NRF-2020S1A6A3A02065553).

Data Availability Statement

The data presented in this study are available from the corresponding author upon request.

Conflicts of Interest

We have no conflict of interest to disclose.

References

Binkley, D.; Fisher, R.F. Ecology and Management of Forest Soils; John Wiley & Sons: Hoboken, NY, USA, 2020. [Google Scholar]
Elser, J.J.; Bracken, M.E.S.; Cleland, E.E.; Gruner, D.S.; Harpole, W.S.; Hillebrand, H.; Ngai, J.T.; Seabloom, E.W.; Shurin, J.B.; Smith, J.E. Global analysis of nitrogen and phosphorus limitation of primary producers in freshwater, marine and terrestrial ecosystems. Ecol. Lett. 2007, 10, 1135–1142. [Google Scholar] [CrossRef]
LeBauer, D.S.; Treseder, K.K. Nitrogen limitation of net primary productivity in terrestrial ecosystems is globally distributed. Ecology 2008, 89, 371–379. [Google Scholar] [CrossRef] [PubMed]
Batjes, N.H. Total carbon and nitrogen in the soils of the world. Eur. J. Soil Sci. 1996, 47, 151–163. [Google Scholar] [CrossRef]
Li, Y.; Zhao, Z.; Wei, S.; Sun, D.; Yang, Q.; Ding, X. Prediction of Regional Forest Soil Nutrients Based on Gaofen-1 Remote Sensing Data. Forests 2021, 12, 1430. [Google Scholar] [CrossRef]
Jeong, G.; Choi, K.; Spohn, M.; Park, S.J.; Huwe, B.; Ließ, M. Environmental drivers of spatial patterns of topsoil nitrogen and phosphorus under monsoon conditions in a complex terrain of South Korea. PLoS ONE 2017, 12, e0183205. [Google Scholar] [CrossRef]
Giardina, C.P.; Litton, C.M.; Crow, S.E.; Asner, G.P. Warming-related increases in soil CO₂ efflux are explained by increased below-ground carbon flux. Nat. Clim. Change 2014, 4, 822–827. [Google Scholar] [CrossRef]
Nave, L.E.; Sparks, J.P.; Le Moine, J.; Hardiman, B.S.; Nadelhoffer, K.J.; Tallant, J.M.; Vogel, C.S.; Strahm, B.D.; Curtis, P.S. Changes in soil nitrogen cycling in a northern temperate forest ecosystem during succession. Biogeochemistry 2014, 121, 471–488. [Google Scholar] [CrossRef]
Zhou, T.; Geng, Y.; Ji, C.; Xu, X.; Wang, H.; Pan, J.; Bumberger, J.; Haase, D.; Lausch, A. Prediction of soil organic carbon and the C:N ratio on a national scale using machine learning and satellite data: A comparison between Sentinel-2, Sentinel-3 and Landsat-8 images. Sci. Total Environ. 2021, 755, 142661. [Google Scholar] [CrossRef]
Giesler, R.; Högberg, M.; Högberg, P. Soil chemistry and plants in Fennoscandian boreal forest as exemplified by a local gradient. Ecology 1998, 79, 119–137. [Google Scholar] [CrossRef]
Liu, Z.; Lei, H.; Lei, L.; Sheng, H. Spatial Prediction of Total Nitrogen in Soil Surface Layer Based on Machine Learning. Sustainability 2022, 14, 11998. [Google Scholar] [CrossRef]
Zhou, T.; Geng, Y.; Chen, J.; Pan, J.; Haase, D.; Lausch, A. High-resolution digital mapping of soil organic carbon and soil total nitrogen using DEM derivatives, Sentinel-1 and Sentinel-2 data based on machine learning algorithms. Sci. Total Environ. 2020, 729, 138244. [Google Scholar] [CrossRef] [PubMed]
Wang, S.; Zhuang, Q.; Jin, X.; Yang, Z.; Liu, H. Predicting Soil Organic Carbon and Soil Nitrogen Stocks in Topsoil of Forest Ecosystems in Northeastern China Using Remote Sensing Data. Remote Sens. 2020, 12, 1115. [Google Scholar] [CrossRef]
Keskin, H.; Grunwald, S. Regression kriging as a workhorse in the digital soil mapper’s toolbox. Geoderma 2018, 326, 22–41. [Google Scholar] [CrossRef]
Behrens, T.; Schmidt, K.; Viscarra Rossel, R.A.; Gries, P.; Scholten, T.; MacMillan, R.A. Spatial modelling with Euclidean distance fields and machine learning. Eur. J. Soil Sci. 2018, 69, 757–770. [Google Scholar] [CrossRef]
Hengl, T.; MacMillan, R.A. Predictive Soil Mapping with R; OpenGeoHub Foundation: Wageningen, The Netherlands, 2019. [Google Scholar]
Johnson, C.E.; Ruiz-Méndez, J.J.; Lawrence, G.B. Forest Soil Chemistry and Terrain Attributes in a Catskills Watershed. Soil Sci. Soc. Am. J. 2000, 64, 1804–1814. [Google Scholar] [CrossRef]
Kunkel, M.L.; Flores, A.N.; Smith, T.J.; McNamara, J.P.; Benner, S.G. A simplified approach for estimating soil carbon and nitrogen stocks in semi-arid complex terrain. Geoderma 2011, 165, 1–11. [Google Scholar] [CrossRef]
Peng, G.; Bing, W.; Guangpo, G.; Guangcan, Z. Spatial distribution of soil organic carbon and total nitrogen based on GIS and geostatistics in a small watershed in a hilly area of northern China. PLoS ONE 2013, 8, e83592. [Google Scholar] [CrossRef]
Kim, C.S. Methods for compatible between South Korea’s forest soil classification and soil taxonomy and world reference base for soil resources classification. For. Environ. 2020, 19, 24–36. [Google Scholar]
Park, S.J.; Sohn, Y.K.; Hong, S.Y.; Park, C.W.; Zhang, Y.S. Spatial Distribution of Major Soil Types in Korea and an Assessment of Soil Predictability Using Soil Forming Factors. J. Korean Geogr. Soc. 2010, 45, 95–118. [Google Scholar]
Lee, S.; Lee, S.; Shin, J.; Yim, J.; Kang, J. Assessing the Carbon Storage of Soil and Litter from National Forest Inventory Data in South Korea. Forests 2020, 11, 1318. [Google Scholar] [CrossRef]
Park, S.J.; Yu, K.B. The Optimal Grid Resolution to Interpret the Spatial Structure of Geomorphological Processes over the Landscape. J. Korean Geomorphol. Assoc. 2004, 11, 113–136. [Google Scholar]
Conrad, O.; Bechtel, B.; Bock, M.; Dietrich, H.; Fischer, E.; Gerlitz, L.; Wehberg, J.; Wichmann, V.; Böhner, J. System for Automated Geoscientific Analyses (SAGA) v. 2.1.4. Geosci. Model Dev. 2015, 8, 1991–2007. [Google Scholar] [CrossRef]
Shim, W.J.; Park, S.J. Development and Evaluation of Hillslope Position Classification using Catena Concept. J. Korean Geogr. Soc. 2020, 55, 231–251. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Lamichhane, S.; Kumar, L.; Wilson, B. Digital soil mapping algorithms and covariates for soil organic carbon mapping and their implications: A review. Geoderma 2019, 352, 395–413. [Google Scholar] [CrossRef]
Chen, T.; He, T.; Benesty, M.; Khotilovich, V.; Tang, Y.; Cho, H.; Zhou, T. Xgboost: EXtreme Gradient Boosting. R Package Version 2015. Available online: https://cran.r-project.org/web/packages/xgboost/index.html (accessed on 1 April 2023.).
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. arXiv 2016, arXiv:1603.027542016. [Google Scholar]
Hinton, G.E.; Salakhutdinov, R.R. Reducing the Dimensionality of Data with Neural Networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef]
Behrens, T.; Schmidt, K.; MacMillan, R.A.; Viscarra Rossel, R.A. Multi-scale digital soil mapping with deep learning. Sci. Rep. 2018, 8, 15244. [Google Scholar] [CrossRef]
Wadoux, A.M.J.C. Using deep learning for multivariate mapping of soil with quantified uncertainty. Geoderma 2019, 351, 59–70. [Google Scholar] [CrossRef]
Wadoux, A.M.J.C.; Padarian, J.; Minasny, B. Multi-source data integration for soil mapping using deep learning. SOIL 2019, 5, 107–119. [Google Scholar] [CrossRef]
Barry, S.; Elith, J. Error and uncertainty in habitat models. J. Appl. Ecol. 2006, 43, 413–423. [Google Scholar] [CrossRef]
Jeong, G.H.; Koo, K.S.; Lee, C.H.; Kim, C.S. Physico-chemical properties of Korean forest soils by regions. J. Korean For. Soc. 2002, 91, 694–700. [Google Scholar]
Jang, S.K.; Sung, M.Y.; Shin, A.Y.; Choi, J.-S.; Son, J.S.; Ahn, J.-Y.; Kim, J.C.; Shin, E.S. A Study for Long-Term Trend of Acid Deposition in Korea. J. Korea Soc. Environ. Adm. 2011, 17, 183–192. [Google Scholar]
National Institute of Forest Science. 2020 Status of National Forest Soil Acidification; National Institute of Forest Science: Seoul, Republic of Korea, 2021. [Google Scholar]
Zamani Joharestani, M.; Cao, C.; Ni, X.; Bashir, B.; Talebiesfandarani, S. PM2.5 Prediction Based on Random Forest, XGBoost, and Deep Learning Using Multisource Remote Sensing Data. Atmosphere 2019, 10, 373. [Google Scholar] [CrossRef]
Kampichler, C.; Wieland, R.; Calmé, S.; Weissenberger, H.; Arriaga-Weiss, S. Classification in conservation biology: A comparison of five machine-learning methods. Ecol. Inform. 2010, 5, 441–450. [Google Scholar] [CrossRef]
Burrough, P.A.; McDonnell, R.A. Principles of Geographical Information Systems; Oxford University Press: Oxford, UK, 1998. [Google Scholar]
Hengl, T.; Nussbaum, M.; Wright, M.N.; Heuvelink, G.B.M.; Gräler, B. Random forest as a generic framework for predictive modeling of spatial and spatio-temporal variables. PeerJ 2018, 6, e5518. [Google Scholar] [CrossRef]
Wang, K.; Zhang, C.; Li, W. Predictive mapping of soil total nitrogen at a regional scale: A comparison between geographically weighted regression and cokriging. Appl. Geogr. 2013, 42, 73–85. [Google Scholar] [CrossRef]
Osman, K.T. Forest Soils: Properties and Management; Springer: Cham, Switzerland, 2013. [Google Scholar]
Moore, I.D.; Gessler, P.E.; Nielsen, G.A.; Peterson, G.A. Soil Attribute Prediction Using Terrain Analysis. Soil Sci. Soc. Am. J. 1993, 57, 443–452. [Google Scholar] [CrossRef]
Park, S.J.; Burt, T.P. Identification and Characterization of Pedogeomorphological Processes on a Hillslope. Soil Sci. Soc. Am. J. 2002, 66, 1897–1910. [Google Scholar] [CrossRef]
Jeong, G.; Oeverdieck, H.; Park, S.J.; Huwe, B.; Ließ, M. Spatial soil nutrients prediction using three supervised learning methods for assessment of land potentials in complex terrain. CATENA 2017, 154, 73–84. [Google Scholar] [CrossRef]

Figure 1. The Republic of Korea administrative district map (a) and topographic map (b).

Figure 2. Distribution of forest soil nitrogen samples and nitrogen level (%) data for the A and B horizons. The A horizon data combine the national forest soil map data and NFI7·FHM data. The B horizon data are derived from the national forest soil map.

Figure 3. Flow chart of the hillslope position classification system using the Upslope Contributing Area Priority Decision Tree methodAdapted with permission from [25].;published by Journal of the Korean Geographical Society.

Figure 4. Histograms of the soil nitrogen data (%): (a) A horizon and (b) B horizon.

Figure 5. Boxplot for comparing the soil nitrogen levels of the A and B soil horizons.

Figure 6. Importance of the top 20 variables in predicting the total soil nitrogen content of the A horizon using the random forest algorithm.

Figure 7. Importance of the top 20 variables in predicting the total soil nitrogen content of the B horizon using the random forest algorithm.

Figure 8. Digital soil map of total nitrogen in the A and B horizons (a,b).

Table 1. Classification and codes of the parent material variables.

Rock Type	Variable Name	Code for This Study
Igneous Rock	Granite	GTYPE1
	Porphyry	GTYPE2
	Felsite	GTYPE3
	Andesite	GTYPE4
	Basalt	GTYPE5
	Diorite	GTYPE6
Sedimentary Rock	Limestone	GTYPE7
	Sandstone	GTYPE8
	Mudstone	GTYPE9
	Shale	GTYPE10
	Tuff	GTYPE11
	Conglomerate	GTYPE12
Metamorphic Rock	Gneiss	GTYPE13
	Schist	GTYPE14
	Phyllite	GTYPE15
	Slate	GTYPE16

Table 2. Classifications and codes of the topographic variables.

Variable Classification	Variable Name	Code for This Study
Terrain variable [24]	Elevation	ELEV
	Slope gradient	SLOPE
	Plan curvature	PLANCUR
	Profile curvature	PROFCUR
	Topographic position index	TPI05
	Wetness index	SWI
	Valley depth	VD
	Positive topographic openness	POPEN
	Negative topographic openness	NOPEN
	Insolation	INSOL
Hillslope position [25]	Summit	CATENA.1
	Shoulder	CATENA.2
	Backslope	CATENA.3
	Footslope	CATENA.4
	Toeslope	CATENA.5

Table 3. Type and code of the vegetation and geographic variables.

Variable Classification		Variable Name	Code for This Study
Vegetation	Forest type	Coniferous forest	FTYPE.1
		Broad-leaved forest	FTYPE.2
		Mixed forest	FTYPE.3
	Age class	Age class 1–9	FYEONG.1–9
	Diameter class	Young growth reproduction	FKYUNG.1
		Small diameter	FKYUNG.2
		Medium diameter	FKYUNG.3
		Large diameter	FKYUNG.4
	Crown density	Low canopy density	FCDEN.1
		Medium canopy density	FCDEN.2
		High canopy density	FCDEN.3
Geographic	Coordinates	X	coords.x1
	Coordinates	Y	coords.x2
	Euclidean distance	From center	DISTCC
		From upper left	DISTC1
		From lower left	DISTC2
		From upper right	DISTC3
		From lower right	DISTC4

Table 4. Machine learning results: the root mean square error (RMSE) and explanatory power (R²).

Horizon	RF ¹		DNNs ²		XGBoost ³
Horizon	RMSE	R²	RMSE	R²	RMSE	R²
A	0.11	0.27	0.122	0.13	0.113	0.24
B	0.054	0.45	0.057	0.37	0.054	0.43

¹ RF: Random Forest, ² DNNs: Deep Neural Networks, ³ XGBoost: Extreme Gradient Boosting.

Table 5. Descriptive statistics and comparison of total nitrogen content (%) in forest soils in South Korea: current data versus previous data.

Horizon	1984–1990 (Previous Data) *			2009–2021 (Data in This Study)
Horizon	Mean	SD ¹	CV ²	Mean	SD ¹	CV ²
A	0.19	0.14	73.68	0.22	0.18	80.21
B	0.09	0.09	100	0.09	0.12	125.88

* Source: [35], ¹ SD: Standard Deviation, ² CV: Coefficient of Variance (SD/Mean, %).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

An, Y.; Shim, W.; Jeong, G. High-Resolution Digital Soil Maps of Forest Soil Nitrogen across South Korea Using Three Machine Learning Algorithms. Forests 2023, 14, 1141. https://0-doi-org.brum.beds.ac.uk/10.3390/f14061141

AMA Style

An Y, Shim W, Jeong G. High-Resolution Digital Soil Maps of Forest Soil Nitrogen across South Korea Using Three Machine Learning Algorithms. Forests. 2023; 14(6):1141. https://0-doi-org.brum.beds.ac.uk/10.3390/f14061141

Chicago/Turabian Style

An, Yoosoon, Woojin Shim, and Gwanyong Jeong. 2023. "High-Resolution Digital Soil Maps of Forest Soil Nitrogen across South Korea Using Three Machine Learning Algorithms" Forests 14, no. 6: 1141. https://0-doi-org.brum.beds.ac.uk/10.3390/f14061141

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

High-Resolution Digital Soil Maps of Forest Soil Nitrogen across South Korea Using Three Machine Learning Algorithms

Abstract

1. Introduction

2. Materials and Methods

2.1. Environmental Information of Research Area

2.2. Materials for Digital Soil Mapping

2.2.1. Geospatial Data on Forest Soil Nitrogen

2.2.2. Environmental and Geographic Variables

2.3. Digital Soil Mapping Methods

2.3.1. Random Forest

2.3.2. Extreme Gradient Boosting

2.3.3. Deep Neural Networks

2.3.4. Model Validation and Uncertainty

3. Results

3.1. Description of Soil Nitrogen Data

3.2. Selecting the Optimal Prediction Machine Learning Methods

3.3. Digital Soil Maps for Total Nitrogen in Forest Soils

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI