Next Article in Journal
Assessing the Innovation of Mobile Pedagogy from the Teacher’s Perspective
Previous Article in Journal
Sustainability Assessment of Different Extra Virgin Olive Oil Extraction Methods through a Life Cycle Thinking Approach: Challenges and Opportunities in the Elaio-Technical Sector
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Prediction Study on Archaeological Sites Based on Geographical Variables and Logistic Regression—A Case Study of the Neolithic Era and the Bronze Age of Xiangyang

1
The Key Laboratory of GIS Application Research, School of Geography and Tourism, Chongqing Normal University, Chongqing 401331, China
2
School of History and Society, Chongqing Normal University, Chongqing 401331, China
*
Author to whom correspondence should be addressed.
Sustainability 2022, 14(23), 15675; https://0-doi-org.brum.beds.ac.uk/10.3390/su142315675
Submission received: 5 October 2022 / Revised: 8 November 2022 / Accepted: 23 November 2022 / Published: 25 November 2022

Abstract

:
Archaeological site predictive modeling is widely adopted in archaeological research and cultural resource management. It is conducive to archaeological excavation and reveals the progress of human social civilization. Xiangyang City is the focus of this paper. We selected eight geographical variables as the influencing variables, which are elevation, slope, aspect, micro-landform, slope position, plan curvature, profile curvature, and distance from water. With them, we randomly obtained 260 non-site points at the ratio of 1:1 between site points and non-site points based on the 260 excavated archaeological sites and constructed a sample set of geospatial data and the archaeological based on logistic regression (LR). Using 10-fold cross-validation, we trained and tested the model to select the best samples. Thus, the quantitative relationship between the archaeological sites and geographical variables was established. As a result, the Area Under the Curve (AUC) of the LR model is 0.797 and its accuracy is 0.897 in the study. A geographical detector unveils that the three influencing variables of Distance from water, elevation and Plan Curvature top the chart. The archaeological under LR is highly stable and accurate. The geographical variables constitute crucial variables in the archaeological.

1. Introduction

Archaeological site predictive modelling predicts the probability of new sites existing in a region by analyzing and assessing the dynamic relationship between archaeological sites and their environments. As a foundation for archaeological research and cultural resource management, it is crucial for reconstructing ancient settlement spaces and discovering new sites, and their potential conservation, monitoring and management.
Archaeological site prediction was originally developed by Willey in the Viru Valley [1]. The archaeological practice evolved with the rise of GIS technology, and was later refined and extended by Brandt and Kvamme to North America, South America, the Netherlands and other regions [2]. Archaeological site prediction is influenced by geographic environment and many other factors. Hannah Parow-Souchon explored potential Paleolithic sites with a high potential existence in the eastern Mediterranean and arid margins by using multi-criteria decision analysis (MCDA) to select parameters, such as slope orientation (base), elevation, geomorphology, hydrogeology, drainage network, slope, and vegetation [3]. Sukumar Hazra investigated the potential of archaeological sites in the middle and the lower reaches of Mayurakshi River basin using variables such as slope, slope direction, curvature, distance from water bodies, soil, distance from modern settlements, geology and elevation [4]. Domestic investigations are inadequate; Zhang Hai, using DEM and multispectral remote sensing variables, selected the Loess Plateau in Longdong to construct three archaeological site prediction models (the DEM model, and the DEM and multispectral remote-sensing variables were used, together with the multispectral remote sensing model and the hybrid model) using logistic regression (LR) equations [5]. Yan Li Jie screened out the Yangshao period (5000–3000 B.C.) in Mount Song, 563 known settlement sites and six elements (elevation, slope, distance from river systems, geomorphology, soil and climate), to build a prediction model for settlement site selection by using the integrated index method [6]. Insufficient variables are selected in most models, and they are often selected subjectively. The relevant literature lacks sufficient descriptions of how to objectively select variables and make a quantitative study [7]. The development of machine learning theories and methods in information science offers new ideas for archaeological research [8,9]. For example, machine learning has been trialed in faunal analysis in archaeology [10], heritage landscape [8,11,12] and geographic space [13,14,15]. Such learning uses algorithms to parse data, learn from them and make decisions or predictions about future data. LR is a machine-learning means, built on the method of multivariate statistical techniques (such as LR) to circumvent the pitfalls that researchers encounter when attempting to match the statistical model to available data [16]. The algorithm solves the problem of model validation through active learning [17], providing a reliable basis for research.
Xiangyang City is composed of mountains, plains, and hilly terrain. The archaeological sites selected demonstrate spatial variations due to the influence of changes on water systems and environments. Flood disasters and changes in geographical environments have led to the alteration and reconstruction of the archaeological sites. This study used geographic surveys for factor evaluation, established an effective predictive evaluation model for archaeological sites with logistic regression algorithms, and calculated the appropriate validation and assessment of accuracy. By using DEM and archaeological site data, this study extracted eight variables for the archaeological site prediction model and constructed a geospatial database. Negative samples were randomly selected based on archaeological site points, logistic regression was used for archaeological site prediction, the importance of the variables was evaluated, and an archaeological site prediction model with high accuracy was established.

2. Overview of the Research Area

Xiangyang City is in the northwestern part of Hubei Province, within the hinterlands of a plain in the middle reaches of Hanjiang River at 110°45′–113°06′ E (longitude) and 31°13′–32°37′ N (latitude). It is geologically divided into two major tectonic systems, the Yangtze Paraplatform and the Qinling Geosyncline, and into the four mountain systems of Wudang Mountain, Jing Mountain, Tongbo Mountain and Dahong Mountain. There are 985 rivers in the city, which belong to two major water systems, namely the Hanjiang River and the Leizhang River. The city is in the transition zone from the second terrain to the third terrain in China, with the general topography sloping away from the northwest towards the southeast. The topography can be divided into three basic geographical units: the mountains in the west, the downland and plains in the center, and the low hills in the east (Figure 1). Xiangyang was inhabited during the Paleolithic period and has many Neolithic sites [18], such as the Sanbuliangdao Bridge Neolithic site [19], the Fenghuangzui Neolithic site [20], and the Chuwangcheng Neolithic site. These famous remaining Neolithic historical sites mostly demonstrate the Yangshao, Qujialing and Shijiahe cultures [19,21]. Because of the impact of river erosion and diversion, Xiangyang has witnessed typical geomorphic development and a wide variation of landforms, which makes it suitable for research on ancient human activities. The Sujiaying Ruins site (2792 aBP–2278 aBP)and the Xibandi Ruins site (3068 aBP–2793 aBP) can date back to the Bronze Age [18]. Distinctive topography came into being in the research area as time went by, and rivers have been rechanneled and changed. Therefore, the changes led to the formation of diversified river systems after development in the past thousand years.

3. Geospatial Database

3.1. Data Sources

In this paper, 260 sites (excluding tombs) in Xiangyang were selected (129 in the Neolithic age, and 131 in the Bronze Age) to form a database of archeological sites which details the sites’ locations and names for evaluating the influence of every parameter on their locations. The spatial information from the sites was chosen from the Atlas of Chinese Cultural Relics-Hubei Province (I) [22], the Atlas of Chinese Cultural Relics-Hubei Province (II) [23], the Yearbook of Chinese Archaeology, 2007 [24], the Yearbook of Chinese Archaeology, 2012 [25], the Yearbook of Chinese Archaeology, 2013 [26], the Yearbook of Chinese Archaeology, 2017 [27], Excavation Reports of Archaeology [28,29,30,31,32], and theses and dissertations.
Data on the administrative division are from the Resource and Environment Science and Data Center, Chinese Academy of Sciences: https://www.resdc.cn/ (accessed on 15 January 2022); river data are from the Xiangyang Water Resource Bureau; Landsat 8 OLI satellite images were downloaded from the website of Geospatial Data Cloud: http://www.gscloud.cn (accessed on 15 January 2022) with a spatial resolution of 30 m. More information is provided in Table 1.

3.2. Characteristics of Prehistoric Archaeological Sites and Selection of the Indicator System

The present study is of the Neolithic and Bronze Ages, 5000 aBP–10,000 aBP and 2273–4022 aBP, respectively. Many relics dating back to the Neolithic Era and the Bronze Age can be observed in the vast downland and hills in Xiangyang. The Fenghuangju Ruins site, with a history of 4300 to 5000 years, is the largest central settlement site at the highest level in Nanyang Basin [20], northwest Hubei Province discovered so far. It is also a significant Neolithic town site. The Mulintou Ruins site in Baokang County, with a history of 4600 to 4800 years, is an alpine prehistoric site [28,33], and the Qujialing Culture was extended to Jing Mountain. Residences selected by early Homo sapiens and ancient people, and modern research on excavation history have influenced the spatial distribution of the sites. The Yangshao, Qujialing and Shijiahe cultures were crucial in that era, widely distributed in the Hanjiang River basin and its tributaries in Xiangyang.
Clusters and dispersion of archaeological sites are limited by geography, and changes in their size unveil the formation and increase and decrease in household and individual wealth [34]. Based on the above elements, the influencing variables selected in this paper are elevation, slope, aspect, profile curvature, plane curvature, slope position, micro-landform, and Distance from water. Archaeological sites at different elevations were extracted by spatial queries and data statistics. Eight impact variables were counted based on known archaeological site locations. Raster cells with a resolution of 30 m × 30 m were the basic units for archaeological site prediction. Determining the visualized thematic layers of the geospatial database (Figure 2).

3.3. Data Processing

Neolithic archaeological sites at different elevations were extracted through spatial queries and data statistics. Based on the known archaeological sites, the eight influencing variables and the rasters were adopted as the basic units for archaeological site prediction to determine the visual thematic layers of the geospatial database.
The 260 archaeological sites excavated in Xiangyang were the positive sample for this research. Non-site points were obtained through the random point generation tool of Arcmap 10.6 software at the ratio of 1:1 between sites and non-site points. The individual non-site points were then adjusted according to the condition in which their circular buffer zones with a diameter of 100 m do not overlap with the sites, to ensure that no sites are in the non-site areas. The reclassified layers of sites, non-site points and elevation, Distance from water, slope, aspect, profile curvature, micro-landform, slope position and plane curvature were overlaid to obtain values of the samples with different geographical parameters and ranges.
A Digital Elevation Model (DEM) in ArcGIS10.6 was employed to calculate and obtain geographical variables such as slope, aspect, plane curvature, profile curvature, and slope position. Micro-landform is a small geomorphic unit with 10 types, such as canyon, incised meander, mid-slope ditch and ditch. Climate, wet and dry conditions and topography changed over the period of time covered by the study, which saw rivers form, disappear, and change. Therefore, the distribution of modern rivers may not demonstrate that of the earlier ones. However, the remnants of river changes still retain the characteristics of topographic depressions, which can be effectively identified in the DEM data. The hydrological toolset in ArcGIS10.6 software was adopted in this paper to extract potential rivers in Xiangyang. The result during the river extraction process varied with the size of the raster calculator and the extraction threshold was set to be 40,000. Data of the extracted rivers in this paper were intended to be integrated with that of modern rivers, to simulate all possible rivers. The result is presented in Figure 3.

4. Research Methods for Archaeological Site Prediction

4.1. Research Steps and Research Objectives

The authors aimed to: (1) develop an LR archeological prediction model based on the known sites, the best ratio of positive to negative cells and 10-fold cross-validation; (2) evaluate the importance of geographical variables and learn about the ancient people’s preference for residence in different types of topography; and (3) explore reasons why the archeological is promoted and employed, and establish an accurate and generalized model for extensive use.
The evaluation includes four stages: (1) selection of predictive variables for archaeological sites; (2) construction of the LR model based on positive and negative cells; (3) application of 10-fold cross-validation in Xiangyang City; and (d) model validation. The main operating software and platform were ArcGIS 10.6 and ENVI, and the programming language was Python (Figure 4).

4.2. Logistic Regression Model

The LR model is a generalized LR analysis model capable of multi-variable control [35]. An S-shaped function (Equation (1)) constrains the output value of the LR model to (0, 1), as presented below:
f ( z ) = 1 1 + e z
z = w 1 x 1 + w 2 x 2 + + w M x M + b is a weighted linear combination model, b a function’s intercept, w M (M = 1, 2, 3, …, 19) the correlation coefficient, the variable xM (M = 1, 2, 3, …, 8) the eight influencing variables, and f(z) the archaeological site probability.

4.3. Geographical Detectors

Geographical detectors detect spatial dissimilarity and determine its driving force in statistics and the relationship between two variables. A variable detector can measure the importance of a variable in the archaeological according to q; an interactive detector detects the interaction between variables and the archaeological [36]. Detection Equation (2) is demonstrated as follows:
q = 1 1 N σ 2 h 1 L N h σ h 2
where: q is the metric between (0, 1). The closer the value is to 1, the higher the variable’s contribution is; h is the number of metrics graded; Nh and N represent the number at level h and the total number of a given metric, respectively; σ h 2 and σ2 are the variances at level h and of the whole region, respectively.

4.4. Model Training

The total sample size is 520, with a positive sample of 260 and a negative sample of 260; the number of Neolithic site points is 129; the number of Bronze Age site points is 131. Positive and negative samples are composed in the ratio of 1:1. Datasets of all archaeological sites in Xiangyang were constructed. Owing to the randomness of non-site points sampling, the 10-fold cross validation method was adopted in this paper to select positive and negative samples and train the LR model. Through the method, all datasets (with 260 positive and 260 negative samples) were equally divided into 10 non-intersectional subsets. One subset at a time was employed to validate the accuracy of the model, and the remaining subsets were applied in model training [7]. It is evident in Table 2 that the mean accuracy of the LR model’s test samples was 0.738 (Table 2).
The archaeological site prediction approximation was a dichotomous model, and a confusion matrix was adopted to compare the model’s accuracy. As in Table 2, the LR confusion matrix demonstrates high mean accuracy, indicating that the LR model is valid to some extent and is suitable for the probabilistic prediction of archaeological sites in Xiangyang (Table 3).

4.5. Model Evaluation

The curve with aspects involving ROC (Receiver Operating Characteristic) and AUC (Area Under Curve) can be adopted to evaluate and validate the LR model that predicts archaeological sites. The ROC curve is sensitive and runs along both the horizontal and vertical axes. The AUC value at the bottom of the curve can be quantified to estimate the prediction accuracy of the model. A value of the prediction accuracy equal to 1 indicates an ideal model. A prediction accuracy value of less than than 0.5 is not relevant to the model. The prediction accuracy value is close to 1, indicating that the model has good diagnostic properties and high accuracy. The ROC curve model is presented in Figure 4, with LR as the AUC model set whose value is 0.789, which indicates that the model is stable to some extent; the AUC test set with a value of 0.897 implies that the model is capable of prediction to some degree; all sets’ AUC values are 0.797 (Figure 5).

5. Archaeological Site Prediction Results

The established LR model was employed to simulate and study the probability of each raster in the study area, and the archaeological site prediction map of the Neolithic Era was obtained (Figure 6). The probability graph of archeological sites indicates that areas likely to have such sites are mainly in the periphery of the Hanjiang River basin, the Tangbai River basin and other tributaries, the hinterlands of mountains and hills as well as the flat areas along the rivers, such as Yicheng City, Xiangcheng District, Fancheng District, Xiangzhou District, Zaoyang City, and other central downland, plain areas, and eastern low mountains and hills. Areas with a low probability of having such sites are mainly in Baokang, most areas of Gucheng, and the central and the western parts of Nanzhang. They are crucial areas of mountain ranges and ecological nature reserves. The number of alpine sites gradually increased from the Neolithic Era to the Bronze Age. In recent years, the Mulintou Ruins site, a rare alpine site, was discovered in Baokang County. A high-level clan cemetery was unveiled there, with abundant burial objects and exquisite unearthed artifacts of jade battle-axes, jade guilloches and ivory pipes. It is the first time to discover such objects of the Qujialing Culture in archaeology. Most of Baokang are mountainous areas, but it is possible to unearth prehistoric sites there. It’s worth noting that most mountainous sites are near water. Mountainous areas are unlikely to be the places where archeological sites will be found because of the number of excavated sites and restrictions of topography. Flat areas near water reveal dense sites. However, only three sites are at an altitude of over 500 m, and no sites have been found at altitudes from 200 m to 300 m and 400 m to 500 m. In the Neolithic Era and the Bronze Age, ancient people preferred to live in plains, hills, downland and in areas with sufficient resources. The probability of LR prediction is 0.0001 at the lowest, and 0.9832 at the highest. The study shows that the LR model has performed well in its overall prediction.
Different options in different environments were seen in the Neolithic Era and the Bronze Age. The spatial distribution of the sites dating back to the Neolithic Era is consistent with that of those in the Bronze Age. Most of the sites in the Neolithic Era and the Bronze Age are concentrated in areas at elevations of 0–150 m (Figure 7), with slopes of 0–10° (Figure 8) and within 1000 m from rivers. The Luojiaying Ruins site [37] and Geleiju Ruins site [22,23] remained from the Neolithic Era until the Eastern Zhou Dynasty (10,000 aBP–2298 aBP), which indicates that the region was inhabited for a long time. Floods, on the contrary, fertilized the soil, which laid a solid foundation for agriculture and farming. In the early days, Xiangyang was abundant in water resources and fertile soil. The archaic Homo sapiens and ancient people were not able to greatly transform their environments, so the sites are mostly in plains, hills and downland.

Validation and Analysis

Based on field research, the Liaojiawan Ruins site dating back to the Eastern Zhou Dynasty [38] and the Laoyacang Ruins site dating back to the Neolithic Era and the Eastern Zhou Dynasty were selected [39]. As presented in Figure 9, both are in areas with predicted high levels of the archaeological sites’ occurrence. These sites are in the northern part of the Yicheng Plain, with low density and small size [31], because of the environment of geographical and water resources. The Laoyacang Ruins site is in Huwan Village, Xiaohe Town, Yicheng City, with an area of about 60,000 m2. The Ziyang Terrace, where the Mulintou Ruins site, Baokang, is located, is in the northern part of the Chongyang Basin, Jing Mountain ranges, whose altitude gradually decreases from north to south. To the north lies a section extending from the major Luo Mountain range, to the south, the Ju River, to the west, the Chongxi River and to the east, a stream flowing down from a mountain to the Ju River. The site is roughly rectangular, covering an area of about 80,000 m2 [28]. Its elevation is 306 m, and the region has a low possibility of having archeological sites. The excavation of this site indicates the existence of prehistoric alpine sites in Xiangyang, but it is rare. It is possible to discover sites in areas associated with low probabilities predicted in this paper, which is consistent with results of the study. The results are limited by the number of archaeological excavations, but they confirm that alpine sites in Xiangyang were rare at that time. Thus, the archaeological in this paper retains a good predictive ability.
The Gun River and the Sha River basins in Zaoyang City were validated by the model, where the archaeological sites are consistent with the predicted results. The validation results of the LR model demonstrate that areas, where, according to the predictive model, archaeological sites are probably located, are in line with the actual distribution of archaeological sites.

6. Discussion and Conclusions

6.1. Discussion

6.1.1. Geographical Variables and the Importance Ranking of Variables

The eight variables were ranked in descending order according to their influence on the probability of the occurrence of archaeological sites as follows: Distance from water, elevation, plane curvature, profile curvature, slope, micro-landform, slope position, aspect (Figure 10). The ancient sites in Xiangyang are mainly distributed in the middle downland, plains and the eastern low mountains and hills, while they are rarely found in the western mountainous areas. This phenomenon indicates that site selection by ancient humans was elevation-related; water is crucial for ancient humans’ choice of sites, because of their accessibility to food, need for irrigation in agricultural production, and thoughts of avoiding harm and pursuing profits [40]. Historical sites were established at a safe distance away from a water system. The research conclusions coincide with others, which implies that the model is scientific. The archaeological sites in Xiangyang are mainly in the central downland, plains and the eastern low mountains and hills, while the western mountainous areas were rarely chosen. Therefore, site selection by archaic Homo sapiens in the Neolithic Era and the Bronze Age as well as by other ancient people was elevation-related.
In addition, plane curvature and profile curvature are ranked highly, and these factors are scarcely discussed in other research. The two, as typical geographic variables, are critical parameters for demonstrating the morphological characteristics of landforms. Curvature is a quantitative measurement of the distortion of a point on the landform’s surface. The vertical and the horizontal components of surface curvature are plane curvature and profile curvature, respectively. Traditional topographic variables are no longer sufficient to unveil all geomorphic features, but the introduction of influencing variables such as plane curvature and profile curvature can clarify the influence of multiple geomorphic types on the occurrence probability of archaeological sites. The greater the profile curvature is, the higher the probability of soil changes and erosion will be. The crop yield will decrease and the human-land conflict will be prominent. The decrease in arable land areas leads to the scarcity of resources, and the profile curvature has become important in influencing the site selection of ancient humans. The plane curvature not only shows the structure and shape of the terrain, but also affects the distribution of soil organic matter contents, which is essential for surface process simulation, hydrology, and soil. Xiangyang was reclaimed during the Neolithic Era. Crop farming was easy to develop in an area with a low elevation, sufficient heat, a small slope, excellent soil moisture retention, and less loss of soil fertility. Based on the previous studies, crusts of millet and rice had appeared in Xiangyang in the late Neolithic Era, which indicates that agriculture became a new pattern of livelihood at that time. Climate, irrigation and soil were significant conditions for agriculture and important reasons for the site selection of archaic Homo sapiens and ancient people. Additionally, the continuation of archaeological culture is mostly influenced by these two geographical variables. Slope has been determined important in previous studies, and its q value in this paper is high, which is consistent with the known research findings. However, the q values of micro-landform, slope position and aspect are low, which indicates that these variables have little influence on the prediction of archaeological sites. Mostly, the slope position has greater influence on mountain ecosystems, but the archaeological sites in this paper are rarely distributed in mountains. Therefore, the slope position has lesser influence; micro-geomorphology can analyze the formation of macro-geomorphology and is often adopted in detecting the occurrence of land disasters, with less influence on the region.

6.1.2. Relationship between River Levels and Historical Sites

There are many water systems and dense river valleys in Xiangyang. A large number of downland and small plains were formed, because rivers shaped the land. Most of the mountains are middle and low mountains or hills, while the tops of downland are flat and grow in a north-south direction. Those plains are mostly alluvial plains on both sides of the rivers. Xiangyang belongs to the warm-temperate zone with a continental monsoon climate transitioning from north to south. There is a large annual variation rate of precipitation, and the precipitation is concentrated in a short time with uneven distribution, which can easily lead to flash floods. After the period of the Yangshao Culture, rice, corn and millet were combined in farming, while rice farming gradually dominated agriculture after the period of the Qujialing Culture. The importance of rice farming surged, but there are still many remains of millet from the Zhou Dynasty [41]. Xiangyang was suitable for human habitation and agricultural production because of its warm climate and abundant precipitation during the Zhou Dynasty.
Statistics on the quantitative relationship between archaeological sites and river distances demonstrate that (Figure 11) from the Neolithic Era to the Bronze Age, the number of archaeological sites gradually decreased as the distance increased, which indicates a degressive trend. The archaeological sites of the Neolithic Era and the Bronze Age are mostly 1500 m away from the second-pole streams. A total of fifty-one sites mostly cluster near lakes; only seven sites are distributed near the fourth-pole streams, those with the largest size and higher density. The distance between a site and a river is mostly within 1000 m, such as the sites in Baokang and Xiangzhou. Most ancient sites are near the fifth-pole streams, with a total of 202. They are mainly distributed within the scope of 500 m away from rivers, as branches in line with the trend of rivers. In the northern plains, the distance to a river is mostly within 3000 m [42]; sites in mountainous areas, such as the Three Gorges Reservoir area, are mostly 400 m to 600 m away from rivers [43]. Xiangyang witnesses flexible proximity to water. Some sites are 0 m to 500 m, 500 m to 1000 m, 1000 m to 3000 m and even 7000 away from rivers. Those sites cluster in large numbers, which is due to dense river networks and lakes. The flexible proximity to water has become a characteristic of Xiangyang. The distance between an ancient site and water determines the site’s size. Besides, the distances between the ancient site and other sites indicate its importance and centrality, which laid a solid foundation for the Chu culture.

6.1.3. Comparative Analysis of RF and Other Predictive Methods

The scientific validity of archaeological site prediction methods directly determines the credibility of evaluation results. Previous studies employed archaeological data and environmental variables, as well as indicator-fitting based on multivariate statistical techniques and methodological approaches. However, the interrelationships among indicators were often neglected; thus, the constraints of restrictive variables were underestimated. In terms of research methods, for instance, Félix adopted satellite imagery and a digital elevation model to build an archaeological predictive model through various techniques (spatial analysis, statistical techniques and fuzzy logic) to identify areas that are very likely to have archaeological sites in the Awserd region of southern Morocco [44]. Approximately 80% of the model predictions were confirmed as correct by field observations. G.A. Diwan employed three inductive methods, such as frequency ratios (FR), statistical indices (Wi) and binary logistic regression (BLR) to develop predictive models for sites in Bekaa (Lebanon) during the Iron Age [45]. Besides, the accuracy and the predictive ability of these models were tested by Kvamme’s gain values. It turned out that all model results were reliable. The study demonstrated that the Wi method performed better at Iron Age I and II sites, with high probabilities of 74.53% and 74.97% respectively; combined with the gain validation, the FR method performed better with the probability of 0.81 in an overall manner. Kavmme’s gain values remain one of the widely used methods for testing the validity of archaeological predictive models, and independent random sampling is still one of the crucial means for validation. M. Noviello established an archaeological based on multi-parameter spatial analysis (MPSA) of geographic information systems (GIS) and maximum entropy model (MaxEnt) [46]. The MaxEnt method is more effective than GIS_MSPA in improving the prediction of archaeological presence. With its AUC in the range of 80–85%, it has higher overall performance (AUC) and savings (in terms of reduced environmental variables) than GIS_MSPA. The research results indicate that LR is the most effective way, though archaeological researchers have started to explore more sophisticated models. Based on research conducted by Märker [47], Peter M. Yaworsky [48], Roalkvam [48] and others, some have elaborated machine learning in archaeological site prediction, and the method is more accurate than other models. It is feasible for machine learning to be a model for archaeological site prediction.
Model optimization was not employed in the above research. However, the model has built-in parameters. It may not be optimal, so its accuracy can be further improved. It’s not convincing to compare the unoptimized model with others, because such comparison doesn’t practically indicate the merits and demerits of each model in the particular study area. Hyperparameters with a certain range can be found by iterative processing of the probability model. The LR model involving geospatial techniques as the above model has been proven to be an effective archaeological method to reduce the blind zone in field exploration. The model avoids human subjective factors and is not computing-intensive but runs quickly, which can solve complex multicollinearity problems and sample imbalance without variable analysis. It has a strong generalization ability and is easy to understand and implement. That makes up for the shortcomings of existing methods and provides more reference for cultural heritage management decisions because of its evaluation conducted variable-by-variable and dimension-by-dimension. However, as a black-box algorithm, it sometimes produces uninterpretable results.

6.1.4. Changes in Preference for Residence Selection

Geography has been a crucial reference in the choice of human residence from ancient times to the present, and modern people are provided with more options for residence. As time goes by, people select their residences based on various factors such as hydrology, geography, topography, geology, biology, and other human settlements. Archaic Homo sapiens and ancient people relied more on geography in the early days. Favorable topographic conditions and water resources for irrigation became essential references for ancient people [49]. However, the study showed that the plane curvature and the profile curvature became the qualifying conditions affecting early residences in Xiangyang when the place had abundant water resources and sufficient land available. These two geographic variable indicators demonstrate the water and soil erosion and soil changes in the region adequately. This implies that the prevention and control of waterlogging and soil became important indicators in areas with abundant water resources, which coincides with the early formation of agriculture in the region. The recovery of soil in the prehistoric period will be a window to explore the ways how ancient people worked and lived at that time. Besides, small watersheds were selected in Xiangyang for living and development, which is in line with the thoughts about water use at present. However, the choices made in the past were limited by floods and agricultural production while those now depend on the environment and people’s aspiration for future life.

6.2. Conclusions of the Study

Based on the LR algorithm, a predictive model of Xianyang’s archaeological sites was established, revealing the spatial distribution probability of archaeological sites and enriching the predictive methods of the sites. The results of the study demonstrate that:
(1)
The accuracy, precision and AUC of the experimental dataset based on the LR model were 0.789, 0.897 and 0.797. The archeological of Xiangyang based on the LR model matched well with the spatial distribution pattern of the actual archaeological sites. The model exhibited excellent stability and predictive ability in archaeological site prediction.
(2)
Based on the highly probable zone of historical sites’ occurrence in Xiangyang, it is conspicuous that ancient humans consciously chose to live in small watersheds at low altitudes and gradually developed a stable economy and agricultural civilization after settling down.
(3)
The size of influencing variables determines the spatial distribution of archaeological sites. After research, the sites are influenced by variables including Distance from water, profile curvature and plane curvature. With these variables, a geospatial database can be built, thus an archeological can be constructed.
(4)
The spatial distribution probability of historical sites in the eastern, central and western regions is closely related to geographical variables, and ancient humans would actively adapt to changes in the environment.
Collecting and excavating data on prehistoric settlement sites, determining prediction parameters, filtering factors and other elements can give a good picture of the spatial distribution of archaeological sites. O.L. Thabeng [50] showed that remote sensing methods of very high resolution satellite imagery and machine learning (e.g., support vector machines, random forests) improved the accuracy of predictive models of archaeological sites occupied by agricultural communities. In the future, the use of Earth observation techniques such as optical and radar satellite remote sensing and geophysics, combined with other machine learning to explore model accuracy deserves further exploration [51,52].

Author Contributions

Data curation, L.L. and X.C.; Writing—original draft, L.L. and X.C.; Investigation and Resources, L.L.; Visualization, X.C.; Writing—review, Editing, Conceptualization and Methodology, Y.L.; Validation, D.S.; Funding acquisition, L.L. and D.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Chongqing Graduate Research Innovation Projec (Project No.: CYB22264), Chongqing Municipal Education Commission Science and Technology Research Project (Project No.: KJQN202000525), Chongqing Natural Science Foundation (Project No.: cstc2020jcyj-msxmX0841); China National Natural Science Foundation (Project No.: 42071217).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The Archaeological site data presented in this study are from Atlas of Chinese Cultural Relics—Hubei Province, Yearbooks of Chinese Archaeology, theses and dissertations, excavation. The DEM data presented in this study are openly available at http://www.gscloud.cn/ (accessed on 15 January 2022). The water bodies and administrative map data presented in this study are openly available at https://www.resdc.cn/ (accessed on 15 January 2022).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Willey, G. Prehistoric settlement patterns in the Virú; Valley, Peru. Bur. Am. Ethnol. Bull. 1953, 155, 1–453. [Google Scholar]
  2. Brandt, R.; Groenewoudt, B.; Kvamme, K. An experiment in archaeological site location: Modeling in the Netherlands using GIS techniques. World Archaeolo. 1992, 24, 268–282. [Google Scholar] [CrossRef]
  3. Parow-Souchon, H.; Zickel, M.; Manner, H. Upper Palaeolithic sites and where to find them: A predictive modelling approach to assess site expectancy in the Southern Levant. Quater. Int. 2021, 635, 53–72. [Google Scholar] [CrossRef]
  4. Hazra, S. Prediction of Archaeological Potential Site in Middle and Lower Course of Mayurakshi River Basin, Eastern India Using Logistic Regression Model and GIS. J. Multidiscip. Stud. Archaeol. 2020, 8, 875–890. [Google Scholar]
  5. Zhang, H.; Xu, Y.; Zhou, J. Multispectral remote sensing and site prediction modeling of pre-Qin sites in Longdong. Natl. Remote Sens. Bull. 2021, 25, 2396–2408. [Google Scholar]
  6. Yan, L.; Lu, P.; Chen, P.; Danese, M.; Li, X.; Masini, N.; Wang, X.; Guo, L.; Zhao, D. Towards an Operative Predictive Model for the Songshan Area during the Yangshao Period. ISPRS Int. J. Geo-Inf. 2021, 10, 217. [Google Scholar] [CrossRef]
  7. Sun, D.; Wen, H.; Wang, D.; Xu, J. A random forest model of landslide susceptibility mapping based on hyperparameter optimization using Bayes algorithm. Geomorphology 2020, 362, 107201. [Google Scholar] [CrossRef]
  8. Davis, D.S. Defining what we study: The contribution of machine automation in archaeological research. Digit. Appl. Archaeol. Cult. Heritag. 2020, 18, e00152. [Google Scholar] [CrossRef]
  9. Bickler, S.H. Machine learning identification and classification of historic ceramics. Archaeology 2018, 20, 20–32. [Google Scholar]
  10. Chowdhury, M.P.; Choudhury, K.D.; Bouchard, G.P.; Riel-Salvatore, J.; Negrino, F.; Benazzi, S.; Slimak, L.; Frasier, B.; Szabo, V.; Harrison, R.; et al. Michael Buckley. Machine learning ATR-FTIR spectroscopy data for the screening of collagen for ZooMS analysis and mtDNA in archaeological bone. J. Archaeol. Sci. 2021, 126, 105311. [Google Scholar] [CrossRef]
  11. Colmenero-Fernández, A.; Feito, F. Image processing for graphic normalisation of the ceramic profile in archaeological sketches making use of deep neuronal net (DNN). Digit. Appl. Archaeol. Cult. Herit. 2021, 22, e00196. [Google Scholar] [CrossRef]
  12. Davis, D.; DiNapoli, R.; Douglass, K. Integrating Point Process Models, Evolutionary Ecology and Traditional Knowledge Improves Landscape Archaeology—A Case from Southwest Madagascar. Geosciences 2020, 10, 287. [Google Scholar] [CrossRef]
  13. Lambers, K. Learning to look at LiDAR: The use of R-CNN in the automated detection of archaeological objects in LiDAR data from the Netherlands. J. Comput. Appl. Archaeol. 2019, 2, 31–40. [Google Scholar]
  14. Reese, K. Deep learning artificial neural networks for non-destructive archaeological site dating. J. Archaeolo. Sci. 2021, 132, 105413. [Google Scholar] [CrossRef]
  15. Zheng, M.; Tang, W.; Ogundiran, A.; Yang, J. Spatial Simulation Modeling of Settlement Distribution Driven by Random Forest: Consideration of Landscape Visibility. Sustainability 2020, 12, 4748. [Google Scholar] [CrossRef]
  16. Rondeau, R.; Carleton, W.C.; Collard, M.; Driver, J. Does the Locally-Adaptive Model of Archaeological Potential (LAMAP) work for hunter-gatherer sites? A test using data from the Tanana Valley, Alaska. PLoS ONE 2022, 17, e0265597. [Google Scholar] [CrossRef]
  17. Schein, A.; Ungar, L. Active learning for logistic regression: An evaluation. Mach. Learn. 2007, 68, 235–265. [Google Scholar] [CrossRef]
  18. Ye, Z. Census of Cultural Relics and Historical Sites in Xiangfan City; China Today Magazine Publishers: Bejing, China, 1995. [Google Scholar]
  19. Zhu, J. Investigation of the site of the three-step and two-way bridge in Xiangyang. Jianghan Archaeol. 1984, 2, 17–19. [Google Scholar]
  20. Li, Z.; Wu, T.; Tian, H. Neolithic site of Fenghuangzui, Xiangyang, Hubei Province. Public Archeol. 2021, 1, 12–15. [Google Scholar]
  21. Shan, S. A Study of Qujialing Culture; Wuhan University: Wuhan, China, 2018. [Google Scholar]
  22. China Cultural Relics Bureau. Hubei Volume of China Cultural Relics Atlas Shaanxi (I); Xi’an Map Press: Xi’an, China, 2002. [Google Scholar]
  23. China Cultural Relics Bureau. Hubei Section of China Cultural Relics Atlas Shaanxi (II); Xi’an Map Press: Xi’an, China, 2002. [Google Scholar]
  24. Chinese Archaeological Society. Chinese Archaeological Yearbook 2007; China Social Sciences Press: Beijing, China, 2007. [Google Scholar]
  25. Chinese Archaeological Society. Chinese Archaeological Yearbook 2012; China Social Sciences Press: Beijing, China, 2012. [Google Scholar]
  26. Chinese Archaeological Society. Chinese Archaeological Yearbook 2013; China Social Sciences Press: Beijing, China, 2013. [Google Scholar]
  27. Chinese Archaeological Society. Chinese Archaeological Yearbook 2017; China Social Sciences Press: Beijing, China, 2018. [Google Scholar]
  28. Da, H.; Qu, L.; Wang, H.; Wei, W.; Xiong, C. Report of the 2017 Excavation at the Neolithic Mulintou Site in Baokang County of Hubei Province. Jianghan Archaeol. 2022, 2, 3–30+145. [Google Scholar]
  29. Tian, P. Research on Yangshao Culture in the Middle Reaches of Han River; Chongqing Normal University: Chongqing, China, 2013. [Google Scholar]
  30. Wu, Y. Archaeological Study of Zaoyang Carving Dragon Monument Settlement; Henan University: Kaifeng, China, 2020. [Google Scholar]
  31. Xu, S.; Tian, C.; Yin, H.; Zhan, S. Survey Report of Zhou Dynasty Sites in Yicheng Hubei: Part II. Jianghan Archaeol. 2008, 2, 35–42. [Google Scholar]
  32. Zhang, J. Research on Prehistoric Settlements in the Middle Reaches of Han River Supported by GIS; Zhengzhou University: Zhengzhou, China, 2014. [Google Scholar]
  33. Da, H.; Qu, L.; Wang, H. 2017 Excavation of Neolithic Remains at the Mulintou Site in Baokang, Hubei: Part I. Jianghan Archaeol. 2022, 2, 3–30+145. [Google Scholar]
  34. Chen, P. Pei Anping lectured on the origin of Chinese family, private ownership, civilization, state and city. Pop. Archaeol. 2019, 12, 85. [Google Scholar]
  35. Sun, D.; Xu, J.; Wen, H.; Wang, D. Assessment of landslide susceptibility mapping based on Bayesian hyperparameter optimization: A comparison between logistic regression and random forest. Eng. Geol. 2021, 281, 105972. [Google Scholar] [CrossRef]
  36. Wang Jinfeng, X.C. Geodetectors_Principles and Perspectives. Acta Geogr. Sin. 2017, 72, 116–134. [Google Scholar]
  37. Tang, L.; Tian, J.; Liu, J.; Da, H.; Qu, L. Study on the mode of mountain subsistence in Qujialing Culture period-Taking the Mulintou Site in Baokang, Hubei Province as an example. South. Cult. Relics 2019, 5, 189–199. [Google Scholar]
  38. Xu, S.; Yin, H.; Tian, C. Investigation Report of Zhou Dynasty Cultural Relics in Yicheng City (III). Jianghan Archaeol. 2008, 3, 3–10. [Google Scholar]
  39. Jia, H.; Wang, Y.; Xiong, Z. Trial Excavation Report of Laoguacang Site in Yicheng, Hubei. Jianghan Archaeol. 2003, 3, 16–39. [Google Scholar]
  40. Koohpayma, J.; Makki, M.; Lentschke, J.; AlaviPanah, S. Predicting potential locations of ancient settlements using GIS and Weights-Of-Evidence method (case study: North-East of Iran). J. Archaeol. Sci. Rep. 2021, 40, 103229. [Google Scholar] [CrossRef]
  41. Cheng, G. The relationship between the distribution of Neolithic sites and the evolution of rivers and lakes in Jianghan-Dongting Lake area. J. Anhui Norm. Univ. 2005, 2, 218–221. [Google Scholar]
  42. Liu, S.; Zo, C.; Mao, L.; Jia, X.; Mo, D. Spatial-temporal distribution of Paleolithic-Shangzhou period ancient sites in Shandong Province and its relationship with hydrology and geomorphology. Quat. Study 2021, 41, 1394–1407. [Google Scholar]
  43. Zheng, C.; Zhu, C.; Zhong, Y.; Yin, P.; Bai, J.; Sun, Z. Relationship between spatial-temporal distribution of archaeological sites and natural environment from Paleolithic to Tang-Song period in Chongqing reservoir area. Sci. Bull. 2008, 53, 93–111. [Google Scholar] [CrossRef] [Green Version]
  44. Nsanziyera, A.; Lechgar, H.; Fal, S.; Maanan, M.; Saddiqi, O.; Oujaa, A.; Rhinane, H. Remote-sensing data-based Archaeological Predictive Model (APM) for archaeological site mapping in desert area, South Morocco. CR. Geosci. 2018, 350, 319–330. [Google Scholar] [CrossRef]
  45. Diwan, G. Gis-based comparative archaeological predictive models: A first application to iron age sites in the bekaa (lebanon). Mediterr. Archaeol. Archaeom. 2020, 20, 143–158. [Google Scholar]
  46. Noviello, M.; Cafarelli, B.; Calculli, C.; Sarris, A.; Mairota, P. Investigating the distribution of archaeological sites: Multiparametric vs probability models and potentials for remote sensing data. Appl. Geogr. 2018, 95, 34–44. [Google Scholar] [CrossRef]
  47. Märker, M.; Heydari-Guran, S. Application of datamining technologies to predict Paleolithic site locations in the Zagros Mountains of Iran. In Making History Interactive: Computer Applications and Quantitative Methods in Archaeology (Proceedings of CAA); Crawford, J., Koller, D., Eds.; Archaeopress: Oxford, UK, 2009; pp. 1–7. [Google Scholar]
  48. Roalkvam, I. Algorithmic classification and statistical modelling of coastal settlement patterns in mesolithic South-Eastern Norway. J. Comput. Appl. Archaeol. 2020, 3, 288–307. [Google Scholar] [CrossRef]
  49. Tan, B.; Wang, H.; Wang, X.; Yi, S.; Zhou, J.; Ma, C.; Dai, X. The study of early human settlement preference and settlement prediction in Xinjiang, China. Sci. Rep. 2022, 12, 5072. [Google Scholar] [CrossRef] [PubMed]
  50. Thabeng, O.L.M.; Adam, E. High-resolution remote sensing and advanced classification techniques for the prospection of archaeological sites’ markers: The case of dung deposits in the Shashi-Limpopo Confluence area (southern Africa). J. Archaeol. Sci. 2019, 102, 48–60. [Google Scholar] [CrossRef]
  51. Masini, N.; Lasaponara, R. Sensing the past from space: Approaches to site detection. In Sensing the Past; Springer: Cham, Switzerland, 2017; pp. 23–60. [Google Scholar]
  52. Caspari, G.; Crespo, P. onvolutional neural networks for archaeological site detection–Finding “princely” tombs. J. Archaeol. Sci. 2019, 110, 104998. [Google Scholar] [CrossRef]
Figure 1. Administrative divisions and distribution of archaeological sites in the research area.
Figure 1. Administrative divisions and distribution of archaeological sites in the research area.
Sustainability 14 15675 g001
Figure 2. Thematic layers of influencing variables.
Figure 2. Thematic layers of influencing variables.
Sustainability 14 15675 g002
Figure 3. The distribution of all possible rivers: (a) Extracted rivers. (b) Modern rivers. (c) All possible rivers.
Figure 3. The distribution of all possible rivers: (a) Extracted rivers. (b) Modern rivers. (c) All possible rivers.
Sustainability 14 15675 g003aSustainability 14 15675 g003b
Figure 4. Research flow chart.
Figure 4. Research flow chart.
Sustainability 14 15675 g004
Figure 5. ROC curve and AUC value.
Figure 5. ROC curve and AUC value.
Sustainability 14 15675 g005
Figure 6. Archaeological site prediction map.
Figure 6. Archaeological site prediction map.
Sustainability 14 15675 g006
Figure 7. Relationship between elevation and archeological site.
Figure 7. Relationship between elevation and archeological site.
Sustainability 14 15675 g007
Figure 8. Relationship between slope and archeological sites.
Figure 8. Relationship between slope and archeological sites.
Sustainability 14 15675 g008
Figure 9. Validation of archaeological site locations based on the L.R. model. (A). Laoyacang Ruins site. (B). Liaojiawan Ruins site.
Figure 9. Validation of archaeological site locations based on the L.R. model. (A). Laoyacang Ruins site. (B). Liaojiawan Ruins site.
Sustainability 14 15675 g009
Figure 10. Importance ranking of influencing variables.
Figure 10. Importance ranking of influencing variables.
Sustainability 14 15675 g010
Figure 11. The relationship between river levels and archaeological sites: (a) Relationship between Distance from Water and Archaeological Sites. (b) Relationship between River Classification and Archaeological Sites.
Figure 11. The relationship between river levels and archaeological sites: (a) Relationship between Distance from Water and Archaeological Sites. (b) Relationship between River Classification and Archaeological Sites.
Sustainability 14 15675 g011
Table 1. Data types and data sources Geograp.
Table 1. Data types and data sources Geograp.
Data NameData SourcesTypeScale
Archaeological siteAtlas of Chinese Cultural Relics-Hubei Province (I) &(II), Yearbooks of Chinese Archaeology, theses and dissertations, excavation reports, and other relevant research achievements.Raster1:30
DEMGeospatial Data CloudRaster30 m
RiverXiangyang Water Resource BureauVector1:100,000
Map of administrative divisionResource and Environmental Science and Data Center, Chinese Academy of SciencesVector1:100,000
Table 2. The accuracy of ten-fold cross-validation.
Table 2. The accuracy of ten-fold cross-validation.
Serial NumberLogistic Regression Accuracy (LR)
TrainingTesting
10.7450.692
20.7370.750
30.7440.731
40.6920.754
50.7350.769
60.7330.788
70.7320.788
80.7390.750
90.7410.712
100.7520.731
Table 3. Confusion matrix of the LR model.
Table 3. Confusion matrix of the LR model.
LR Predicted ValueTrue ValueAccuracy
High Low
High226102Accuracy: 0.689
Low34158Accuracy: 0.822
Recall rate: 0.869Recall rate: 0.608Accuracy: 0.738
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Li, L.; Li, Y.; Chen, X.; Sun, D. A Prediction Study on Archaeological Sites Based on Geographical Variables and Logistic Regression—A Case Study of the Neolithic Era and the Bronze Age of Xiangyang. Sustainability 2022, 14, 15675. https://0-doi-org.brum.beds.ac.uk/10.3390/su142315675

AMA Style

Li L, Li Y, Chen X, Sun D. A Prediction Study on Archaeological Sites Based on Geographical Variables and Logistic Regression—A Case Study of the Neolithic Era and the Bronze Age of Xiangyang. Sustainability. 2022; 14(23):15675. https://0-doi-org.brum.beds.ac.uk/10.3390/su142315675

Chicago/Turabian Style

Li, Linzhi, Yujie Li, Xingyu Chen, and Deliang Sun. 2022. "A Prediction Study on Archaeological Sites Based on Geographical Variables and Logistic Regression—A Case Study of the Neolithic Era and the Bronze Age of Xiangyang" Sustainability 14, no. 23: 15675. https://0-doi-org.brum.beds.ac.uk/10.3390/su142315675

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop