In order to promote the coordinated development of the economy, society, and environment, leaders around the world adopted the 2030 Agenda for Sustainable Development at the United Nations (UN) Summit in September 2015 [1
], which covers 17 Sustainable Development Goals (SDGs) with 169 targets and 342 indicators. Quantitative assessment and dynamic monitoring of SDGs are important measures in implementing the UN 2030 Agenda for Sustainable Development [2
]. The calculation of SDG indicators requires a large amount of social and economic statistical data, but most data are collected by administrative units (e.g., province boundaries and county boundaries). They can only represent the average status of statistical objects in spatial regions, and it is difficult to reflect the true distribution in space [3
]. The results of SDG assessments based on statistical information make it difficult to characterize the specific spatial location, so follow-up planning measures are not easy to implement or operate. Evidently, existing and available statistical data cannot meet the practical needs of quantitative assessments and continuous monitoring of SDGs. In the SDGs Global Indicator Framework (SGIF), the calculation of up to 98 indicators needs population data [5
]. Therefore, the geospatial disaggregation of population data at a fine scale is of great significance to support the measurement and monitoring of the SDGs.
To date, many studies and applied practices have focused on measuring and monitoring development goals or relevant topics (e.g., public health and climate change) in accordance with geospatial disaggregation of population data [6
]. For example, the WorldPop project, launched in October 2013, aims to provide an open spatial population dataset for Africa, Asia, Central America, and South America to support development and health applications [11
]. In March 2018, the Geo-Referenced Infrastructure and Demographic Data for Development (GRID3
) initiative, aiming to facilitate the production and use of high-resolution population and other reference data, was launched to support government decision-making and the assessment of SDGs [12
]. Tenerelli et al. [13
] applied population distribution data to disaster risk analysis to provide a scientific basis for the government to deal with climate change and natural disasters. Alegana et al. [14
] used a Bayesian hierarchical spatiotemporal model to estimate the proportion of the under-five population per 1 × 1-km grid cell in Nigeria in 2010. Golding et al. [15
] calculated under-five and neonatal mortality (SDG target 3.2) at a 5 × 5-km resolution in Africa for 2000, 2005, 2010, and 2015 based on the Bayesian geostatistical analytical method. These results showed that detailed population data were conducive to improving the accuracy of development and health metrics assessments and optimizing interventions. Based on OpenStreetMap, statistical data, and WorldPop datasets, Esquivel et al. [16
] mapped disparities in access to safe, timely, and essential surgical care in Zambia and found that 65.9% of the population could not reach a surgical facility that met the World Health Organization’s minimum surgical safety standards within two hours. Using WorldPop datasets, the census, and other data, Tatem et al. [17
] revealed the distribution of the number of pregnancies, women of childbearing age, and live births at a 100-m resolution in Bangladesh, Afghanistan, Tanzania, and Ethiopia, to provide denominators for the quantitative assessment of the subnational Millennium Development Goals. From the High Resolution Settlement Layer (HRSL), population data at a resolution of 1 arc-second have been generated for 33 countries for infrastructure development and disaster response [18
]. Linard et al. [19
] mapped the population distribution at a resolution of 100 m and analyzed population aggregation, settlement patterns, and spatial accessibility in Africa to make recommendations on healthcare, resource allocation, and economic development. Zagatti et al. [20
] used an unsupervised learning algorithm to identify individual locations based on Call Detail Record data and analyzed day and night population densities and commuting patterns. It was found that Haiti’s labor markets were fragmented. In summary, detailed population distribution data are of great significance for measuring development goals and finding existing problems and solutions. However, most existing studies have been based on population data at 100-m or 1-km resolutions to reflect actual problems at the national or subnational level and have lacked an exploration of county-level measurement and monitoring of SDGs based on fine-scale population data.
Spatial disaggregation is the process by which information on a coarse spatial scale is transformed into finer scales [3
], and it is widely used in population spatialization. There are several mainstream methods for the geospatial disaggregation of population data that have been developed, from simple grid models (e.g., the areal interpolation method) to spatial models that take into account natural and economic factors. Early studies assumed that population density decreases from the inner city to the outer suburbs in urban geography, and a distance decay function has been used to simulate the spatial distribution of population [21
]. However, the urban extent of modern cities tends to be irregular, which brings indeterminacy to the model establishment [22
]. In 1993, Goodchild et al. [23
] proposed areal interpolation method to realize the spatialization of social and economic data. Areal interpolation methods can be divided into two categories according to whether auxiliary information used [24
]. For areal interpolation methods without ancillary information, there are point-based methods and area-based methods. Some example studies include Fisher et al. [25
], Fan et al. [26
], and Martin [27
]. This method is simple and suitable for depicting the patterns of population distribution on a large scale but cannot meet the needs of high-resolution mapping.
With the development of earth observation technology, the data available for the geospatial disaggregation of population data are becoming more and more abundant and accurate [28
]. Therefore, more and more factors can be taken into account in the establishment of a population disaggregation model, the most common of which are economic and natural factors. The complex population disaggregation model (relative to the simple grid model) can be divided into three categories according to the main principles, namely dasymetric mapping, multifactor fusion, and intelligent modeling.
The principle of the dasymetric mapping method is to subdivide the population distribution space into small areas that can reflect the spatial variation with the aid of auxiliary information and apply the interpolation technique to generate fine-scale population distribution data. Some example studies include Dmowska et al. [29
], Gallego [30
], and Langford [31
]. Essentially, the dasymetric mapping method is an extension of the areal interpolation with ancillary information. At present, there are three region classification methods, namely the binary dasymetric method [32
], the three-class dasymetric method [33
], and the multilayer and multiclass dasymetric method [34
]. The advantages of dasymetric mapping are that it is simple and easy to work and can ensure the invariance of the total population in source regions. Dasymetric mapping is suitable for fine-scale population spatialization. However, with increases in the number of classifications, they can become quite complex and are limited to some applications.
Another popular method is the multifactor fusion method [35
]. The main steps are (i) analyzing the relationship between impact factors and population data, (ii) selecting the main factors to establish a model through a weighted or regression method, and (iii) correcting the initial simulation results based on the statistical population of sub-administrative regions. The most frequently used factors (i.e., parameters) are roads, population density, land use and land cover, elevation, nighttime light, populated points, etc. This method takes into account the indicative effect of multifactors on the spatial distribution of a population. The results obtained by this method are convincing. However, the determination of the fusion weight is subjective, and many factors are involved in modeling, which leads to model complexity and information redundancy.
The intelligent modeling method, which is characterized by a high automation and flexible model structure, applies a decision tree [38
], genetic algorithm [39
], and random forest [40
] method to the disaggregation of population data. The disadvantages of this method are that the results are poorly controllable and the parameter settings are complex. Now, the intelligent modeling method is mainly combined with classical methods such as the dasymetric mapping method and the multifactor fusion method to improve the accuracy of the population disaggregation model.
There is a growing need for detailed population distribution data to measure and monitor progress toward SDGs, which aim to ensure that no one is left behind. In order to avoid concealing local heterogeneities, the perspective of SDG assessments is being turned from the national and subnational levels to regional and local levels, particularly fine-scale assessments in small regions [41
]. In this paper, we selected Deqing County, China, as the study area. In order to reduce or eliminate the error when assessing SDGs, it is necessary to ensure that the statistical population value is equal to the total population after disaggregation. After fully considering the characteristics of the study area, the data availability, and the advantages and disadvantages of the population disaggregation methods, the dasymetric mapping method, which can ensure invariance in the total population, was selected to realize the spatialization of the population data. Three-dimensional building information (i.e., the building area and the number of floors) and other auxiliary data were used to establish the population disaggregation model. Finally, we used the disaggregated population data with a resolution of 30 m to support the quantitative, qualitative, and positional assessment of Deqing’s progress toward achieving the SDGs.
3. Results and Analyses
The population distribution at a 30-m resolution in Deqing County in 2016 is shown in Figure 3
a. Overall, it shows characteristics of “more in the central and eastern regions, less in the western regions”. The maximal grid value (referring to the population number) was 79. The blank areas within the boundaries of Deqing County were nonresidential areas such as water bodies, cultivated land, mountains, industrial regions, etc. The actual grid value was 0. The grids with 1–4 people accounted for 71.96% of all nonzero grids and were the most widely distributed. The grids with 5–6 people and 7–8 people accounted for 15.90% and 7.74%, respectively, and were located in the central and eastern part of Deqing County: the latter were less dispersed than the former. The grids with 9–79 people accounted for 4.40% of all nonzero grids (with the agglomeration distribution) and were mainly located in the central area of Wukang Town, Qianyuan Town, and Xinshi Town.
As shown in Figure 3
b, the population distribution map was overlaid with geographic elements such as digital elevation model (DEM), roads, and water bodies. Three typical areas, namely the central urban area, western mountainous area, and eastern water villages (a region of rivers and lakes), were selected to analyze population distribution characteristics and details on a fine scale. The map in the left of Figure 3
b is a partial enlarged drawing of the population distribution in the central urban area of Deqing County. This area is the political, economic, and cultural center of Deqing County. The dense distribution of industrial and road infrastructure plays an important role in population aggregation. The grid value ranged from 1 to 79, and the population number gradually decreased from the urban center to the periphery. The map in the middle of Figure 3
b shows a partial enlarged drawing of the population distribution in the west of Deqing County. This area is located on Mogan Mountain, and the population is mainly distributed in strips along the valley bottom and on both sides of the road. The value in the grid unit was mainly 1–6. The map in the right of Figure 3
b is an enlarged population distribution of the local area in Xin’an Town, eastern Deqing County. The area belongs to a typical water village plain in the south of the Yangtze River with a developed water system and numerous lakes. The population is distributed in strips along the sides of the road and on both sides of the river or is distributed in clusters in the plain. The value of the grid units was mainly between 1 and 8. The results show that the geospatial disaggregation results of population data based on three-dimensional building information can plausibly reflect the differences in population distribution within regions and effectively eliminate the impact of nonresidential areas such as mountains, water bodies, and vegetation on population spatialization.
In this study, town-level statistical population data were used to realize the population spatialization. A method of accuracy validation is to aggregate disaggregated population data at lower administrative levels (i.e., the village level) and compare them to the statistical population data of the corresponding administrative region. Deqing County has a total of 166 villages, of which Songcun Village, Wulong Village, Huibei Village, Yangbei Village, Qiubai Village, and Fengqiao Village are involved in land expropriation and do not participate in error analysis. Since the people in this area relocated and settled into their new community, the statistical population value was zero. We obtained accuracy independently for each village, and Figure 4
shows the absolute value of the relative error between the disaggregated population data and the statistical data in 160 villages. Furthermore, we considered the population sizes across villages and calculated the global weighted mean relative error, which was 12.92%, that is, the global average accuracy was 87.08%. The absolute relative error of 85 villages was less than 10%, the error of 46 villages was between 10% and 20%, the error of 16 villages was between 20% and 30%, and the error of 13 villages was more than 30%. In order to explain the reasons for the large errors in some villages, we carried out field investigations and found that the error mainly came from the following four aspects: (1) The urbanization process had accelerated in Deqing County [47
], and new residential land expanded rapidly. There was a phenomenon where the houses that were built were not sold. For example, Xinfeng Village was affected by the vacancy of built residential land, resulting in an absolute error of 242.41%. (2) Due to the reformation of rural settlements within the central urban area, a number of villages in the city and natural villages were being withdrawn and clustered into new communities, and some old houses were not demolished but had no one living in them. For instance, the population estimates of Qiushan and Qianqiu villages were significantly higher than the statistical values, and the absolute relative errors were 114.89% and 508.28%, respectively. (3) In addition to residential functions, some buildings were used for commercial purpose at the same time, such as commercial–residential land in the central urban areas and guest houses around the Mogan Mountain scenic areas and the Xiazhuhu Wetlands. (4) The types of residential buildings distributed in the urban–rural junction were complex and diverse and gradually transitioned from high-density multifloor buildings to low-density low-floor buildings.
4. Disaggregated Population Data for Assessing SDGs: Examples
Based on an understanding of the UN 2030 Sustainable Development Agenda, the China’s National Plan on Implementation of the 2030 Agenda for Sustainable Development (hereinafter referred to as the National Plan) [48
], and the regional characteristics of Deqing County, the 244 indicators of the SGIF were screened and adjusted. A set of SDG indicators suitable for Deqing County was proposed that contained 102 indicators [49
]. In accord with the SDG Index and Dashboard [50
], the National Plan, and other references, these indicators were further quantified and assessed to represent the condition of sustainable development in Deqing County.
The indicators 3.8.1, 4.a.1, and 9.1.1, which could not be accurately quantified based on the population data (tabular form) and other metadata, were selected from the 102 indicators for a discussion of the application of the geospatial disaggregation of population data in the assessment of SDGs.
4.1. Example 1: SDG Indicator 3.8.1
Deqing rationally optimized the allocation and layout of medical resources, actively carried out disease prevention and control, and vigorously promoted comprehensive health management and all-around health services to effectively improve residents’ health and well-being. Deqing focused on family doctor contracting services, highlighting key populations such as maternal, elderly, and chronically ill patients, and strengthened the management of basic public health service projects. The average coverage of basic services is high. At present, Deqing is more concerned about the time spent by residents to reach the nearest medical institution. The original indicator, 3.8.1, is not suitable for the actual situation of Deqing County. Based on an understanding of the UN 2030 Sustainable Development Agenda, the National Plan, and the regional characteristics of Deqing County, indicator 3.8.1 was revised to “coverage of essential health services” after localization.
By the end of 2016, there were three general hospitals, 12 health centers (seven branches), and 133 health service stations in Deqing County. Taking these as targets, the accessibility analysis method was used to calculate the time required to reach the nearest medical facility in the county, and the time was classified at 5-min intervals. As shown in Figure 5
, the accessibility of medical and health facilities was characterized by an annular distribution centered on targets and spreading outward along roads. Here, the difference in accessibility was measured by the time taken to reach medical and health facilities. The areas with good accessibility (i.e., those that required less time to reach medical and health facilities) were concentrated in the central (urban) and eastern parts of Deqing County, and the accessibility was poor (i.e., it took more time to reach medical and health facilities) in the western mountainous areas.
According to the traditional method, we only used census data in towns (i.e., where the population was evenly distributed) to carry out the accessibility calculations. The results are presented in Table 2
and show that within 10 min, 13.37% of residents could reach the nearest general hospital, 72.74% could reach the nearest health center, and 92.92% could reach the nearest health service station. In addition, it took more time to reach medical and health facilities in western mountainous areas, and there was an evident difference in medical services in urban and rural areas.
For comparative analysis, we used the disaggregated population data to do the accessibility calculations again. The results are provided in Table 3
and show that within 10 min, 26.66% of residents could reach the nearest general hospital, 90.26% could reach the nearest health center, and 99.84% could reach the nearest health service station. Clearly, the results calculated by using census data and the disaggregated population data were different. In fact, it is well-known that there are no residents in water bodies, on roads, on cultivated lands, or in most mountainous areas (shown as the map in the middle of Figure 3
b. Compared to traditional methods, the results of the SDGs assessment based on the 30-m disaggregation of population data were more accurate and effective.
In conclusion, more than 99% residents in Deqing County could reach the nearest village health service station within 10 min, the nearest health center within 20 min, and the nearest general hospital within 40 min. Therefore, the accessibility of medical service facilities was good in Deqing County, the coverage of medical and health services was relatively balanced, and medical service facilities could meet the diversified and multilevel medical service needs of urban and rural residents.
4.2. Example 2: SDG Indicator 4.a.1
Indicator 4.a.1 is the “proportion of schools with access to (a) electricity; (b) the Internet for pedagogical purposes; (c) computers for pedagogical purposes; (d) adapted infrastructure and materials for students with disabilities; (e) basic drinking water; (f) single-sex basic sanitation facilities; and (g) basic handwashing facilities (as per the WASH indicator definitions)” (WaSH - water, sanitation, and hygiene). According to the statistical data provided by the Deqing Education Bureau, each proportion was 100%, which indicates that the schools in Deqing County were of the same good quality. To provide every child with an equal right to education, China implemented a nearby enrollment policy (i.e., adolescents receive access to education in the school where their permanent residence is registered). Deqing was more concerned about the time spent by residents to the nearest education facilities. In order to further improve the quality of education and the level of service, this indicator needs to provide a quantitative and positioning assessment from the perspective of statistics and geographic information. By combining an accessibility analysis with the disaggregated population data, the results could be used to describe the educational service level and the quality of Deqing County and to accurately find the areas that need to be improved.
By the end of 2016, there were 31 primary schools, 21 junior high schools, and five senior high schools in Deqing County. Similarly, as in example 1, we used census data from towns to carry out the accessibility calculations. The results are shown in Table 4
, and the influence of roads, water bodies, and other factors could not be avoided. Then, the disaggregated population data were combined with the accessibility analysis to assist in the assessment of educational services in Deqing County. The results are shown in Figure 6
and Table 5
. Evidently, within 15 min, 97.23% and 96.59% of residents could reach the nearest primary school and the junior high school, respectively, and within 30 min, 94.97% of residents could reach the nearest senior high school. We found that the accessibility of primary schools and senior high schools was good and that the spatial distribution was rational. However, the accessibility of the senior high school was relatively poor, and its distribution needs to be optimized.
4.3. Example 3: SDG indicator 9.1.1
The indicator 9.1.1 is defined as “the proportion of the rural population who live within 2 km of an all-season road”. According to the tier classification for global SDG indicators [51
], indicator 9.1.1.a belongs to Tier 3 (i.e., no internationally recognized methodology or metadata are yet available for the indicator).
Road buffers were created around a road feature at 500-m, 1000-m, 1500-m, and 2000-m distances from the feature. In 2016, the 500-m road buffer covered 99.53% of the county’s land, and the 1000-m, 1500-m, and 2000-m road buffers covered all of the areas of the county. Figure 7
shows that the 500-m road buffer was overlaid over the 30-m population data. It was found that there was no population in the area uncovered by the 500-m road buffer, that is, the proportion of the rural population who lived within 500 m of an all-season road was 100%. This example again shows that disaggregated population data can well support quantitative assessments of SDG indicators, even in the absence of recognized methodologies and metadata.
5. Discussion and Conclusions
Quantitative assessments of the SDG indicators based on fine-scale population data are necessary to support implementation of the “2030 Agenda”. However, most population data are collected by administrative units, and it is difficult to reflect true distribution or uniformity in space. In this paper, a geospatial disaggregation method of population data was developed based on geographic information. Based on the idea of dasymetric mapping, the study area was divided into residential areas and nonresidential areas by high-resolution images and other ancillary data. One contribution in this paper was using the building area and the number of floors as the weighting factors of a corresponding grid to establish a 30-m geospatial disaggregation model. After analyzing the statistical population of 160 villages and the disaggregation results comparatively, we found that the global average accuracy was 87.08%.
Another contribution was to apply these disaggregated population data to a quantitative assessment of SDG indicators (e.g., indicator 3.8.1, indicator 4.a.1, and indicator 9.1.1) in an accessibility and buffer analysis. Taking indicator 3.8.1 as an example, this paper illustrated in detail the differences between the results of an accessibility analysis with the traditional method and the results using the spatial disaggregation method. The results calculated by the traditional method demonstrate that residents took more time to reach medical and health facilities in the western mountainous areas, and there was a clear difference in the spatial distribution of medical services between urban and rural areas in Deqing County. However, combining the accessibility analysis with the disaggregated population data, it was found that the accessibility of medical and health facilities was good and that the spatial distribution of medical resources was relatively reasonable. Despite poor accessibility in the western mountainous areas, high-resolution images showed that there were almost no buildings in this area, and thus there were almost no residents. The traditional method ignores population heterogeneity within regions. In contrast, the disaggregation method could avoid this problem and show the population number and distribution on a fine scale, which could render the assessment results more accurate and reliable. Similarly, based on accessibility, a buffer analysis, and disaggregated population data, we assessed indicator 4.a.1 and indicator 9.1.1 and analyzed the state of sustainable development in education and traffic. In conclusion, the geospatial disaggregation of population data was of great significance for the quantitative assessment of the progress of SDGs.
Significantly, many problems still exist in the current research on the geospatial disaggregation of population data. At present, the grid size used for the spatialization of population data varies widely at home and abroad. For the same research problem, choosing different scales of data products may lead to different conclusions [52
]. To date, few studies have been conducted on scale effects. Limited by factors such as the time mismatching of data, poor quality of basic geographic data, and inconsistency of statistical methods, spatialization results do present uncertainty [8
]. As the main input data of the dasymetric mapping method, statistical population data may have problems with statistical methods and caliber inconsistencies, thus reducing the quality of the output data and restricting the application of the results. Some population spatialization models that consider many factors could improve accuracy, but at the same time, they could bring about problems, such as difficulties in determining the weight of each factor and an unclear mechanism. With the methods described in this study, future works include determining an optimal grid scale in data disaggregation for a research area with different scales of spatial and statistical data products and optimizing the weight coefficients of a disaggregation model with many factors by using geographic or digital data, such as night-light images, intelligent phone data, hotspot data, etc. In addition, it is necessary to establish a perfect and reasonable results verification system to further improve the accuracy and practicability of the geospatial disaggregation of population data.