In September 2015, the United Nations (UN) General Assembly created the Sustainable Development Goals (SDGs), a list of 17 shared objectives to eradicate poverty, protect the planet, and ensure global prosperity for all [1
]. To monitor progress toward the SDGs, the Interagency and Expert Group on SDG Indicators (IAEG-SDGs) developed a global indicator framework consisting of 232 specific statistical measures that member states could adopt and extend [2
]. While comprehensive, the SDGs have been criticized for their sprawling scope and the high expected cost of implementing and monitoring the various indicators [3
]. As of June 2017, even wealthy countries belonging to the Organization for Economic Cooperation and Development (OECD) only have the capacity to evaluate 57% of all the SDGs targets [5
]. Without support, it will be particularly difficult for LMICs to monitor these indicators and measure progress toward achieving the SDGs.
While governments, researchers, and NGOs are exploring ways that big data sources may reduce the burden of developing and monitoring SDG indicators, household surveys are still a critical source of information. Roughly one-third of the SDG indicators can currently be derived from existing household surveys, and up to two-thirds could be covered with further enhancements to these programs [6
]. Furthermore, surveys are one of the only data sources capable of systematically collecting the desired level of information that can be “disaggregated by income, gender, age, race, ethnicity, migratory status, disability, geographic location and other characteristics relevant in national contexts (SDG Target 17.18)” [2
]. Though bias may be introduced into survey data at many points throughout the design, data collection, and data processing phases of a study [7
], survey researchers have designed methods to recognize and address risks of various error components [8
While household surveys provide high-quality estimates to support the analysis of SDG indicators, the scope and frequency of these programs is limited by their cost [10
]. This is, in part, due to the extensiveness of the operations required to select a probability-based sample of households. In probability-based sampling, a sampling frame must exist so that each member of the population has a known probability of being selected. The quality of this list is crucial, as it determines the degree to which the observed sample represents the intended population. Some countries’ statistical agencies are able to maintain comprehensive and up-to-date databases of persons or households for sampling, but in many cases, including low- and middle-income countries (LMICs), it is necessary to create a complete listing at the time of the survey. For example, the Demographic and Health Survey (DHS) is a large-scale study that currently captures data to support up to 30 SDG indicators across nearly 90 developing countries [11
]. While impressive in its scope, the standard approach of constructing a household listing for the DHS is cumbersome. While existing census data for each country typically provide a list of logistically manageable geographic areas for a first stage of sampling (e.g., counties or districts), field staff are typically required to visit sampled areas on foot to roster households. The practice of enumerating households is not only expensive, but also potential dangerous; researchers have noted concerns of robbery or violence when sending field staff into high-risk areas [12
], especially for listings that require field staff to spend the majority of time surveying from the streets instead of inside respondents’ dwellings [14
In this paper, we focus on a method to reduce data collection costs and timeline for household surveys in LMICs: constructing the household listings required for probability-based samples. We explore a streamlined approach to obtaining household listings using machine learning models to detect and enumerate settlement units directly from satellite imagery. Specifically, we use the Kaduna state in Nigeria as a case study for applying an object detection model to identify and locate buildings from satellite images. These methods may reduce the level of effort required for probability-based household surveys, thus mitigating a barrier to more frequent measurements of SDG metrics.
In Section 1
, we introduce the issues of developing and maintain household listings in LMICs. We also provide related work in the literature that complements this study. In Section 2
, we describe the data used to train and evaluate our building detection model, as well as describing the model and associated evaluation metrics. In Section 3
, we summarize the results of the study and conclude with a discussion of the findings in Section 4
Our findings suggest that, while not without flaws, deep learning models show promise in performing building detection in LMICs settings. The model results suggest that building detection is more difficult to perform consistently in areas with high building density for the class of models examined. These results agree with similar findings in the Object-Based Image Analysis (OBIA) literature, in which building extraction tasks are reported as being more difficult to perform in residential areas than central business districts, where there is higher spectral complexity and building displacement [49
]. Additionally, other studies report difficulty identifying attached building types more likely to be seen in dense urban areas (i.e., apartments) [50
]. Detecting and counting objects in highly dense scenes is an active area of research in the computer vision literature, spanning from crowd counting and density estimation [51
] to counting for cell microscopy [53
]. Future work that explores models designed specifically for these settings could enhance building detection in urban areas and other settlement dense regions.
Furthermore, our findings suggest that predicted building counts have high correlation with the number of households within a region, which is encouraging for moving towards an automatic listing of households. Interestingly, the trend between household counts and predicted building counts appears to be heteroskedastic in our study area, showing larger variances in differences when higher values of either variable are observed. While the grids with a notably higher number of households than predicted buildings may be partially explained by the findings in Section 3.1
, in which our model detects a lower proportion of buildings in areas of high building density, the reason for having areas with significantly more predicted buildings than households is less clear. To demonstrate, Figure 6
juxtaposes two images from our test data representing urban areas of Kaduna; both of these images have a similar number of predicted buildings but vastly different number of reported households. Figure 6
a has 40 predicted buildings with only 6 households, whereas Figure 6
b has 48 predicted buildings with 91 households.
There are many potential reasons why this might occur. One possible explanation is that the dense areas from our study have greater diversity in the types of building structures present (e.g., some images may represent a higher concentration of commercial buildings). Though Kaduna has a history of weak land-use planning [55
], which could allow for large variations in mixed-use development in its urban corridors, more research is needed to confidently test this hypothesis. Another hypothesis for the observed variation in household counts is that interviewers did not consistently adhere to the survey’s household enumeration protocols within sampled SGCs. In urban neighborhoods where buildings are dense, recording all households is logistically difficult (e.g., entrances to some residences may be blocked or not visible from the road). In these cases, household counts may be underestimated. Intentional data falsification by field interviewers is a well-documented phenomenon in survey data collection [56
] and may also contribute to some instances of low household counts in areas with many predicted buildings. We are hopeful that advances in technology will offer improvements to the “on-the-ground” meta-data typically collected during surveys. Enhancements to these data would provide more robust information for validation of building detection tasks and strengthen researchers’ understanding of the relationship between buildings and households.
The adoption of building detection methods as a streamlined approach for constructing household sampling frames ultimately depends on the minimization of prediction error. Prediction error introduces two types of challenges for a survey: (1) overcoverage, where the constructed listing includes buildings that are not of interest for the study, and (2) undercoverage, where the listing excludes buildings that belong to the intended target population. To illustrate this point, let us imagine a scenario where a list of predicted buildings is generated for an area and this list is used exclusively as the sampling frame for a household survey measuring SDG indicators. In cases where field staff visit buildings that do not contain households (overcoverage), unnecessary costs may be incurred to the project budget from the wasted time and travel resources; however, no bias should be introduced into the survey estimates in this case, assuming that the nonresidential or vacant buildings can be identified during data collection and appropriate adjustments can be made during analysis. Alternatively, if the model fails to detect all buildings that contain households and thus excludes them from the sampling frame (undercoverage), bias becomes a substantial concern. Survey estimates derived from a sampling frame suffering from undercoverage may include error because the excluded households may be systematically different from those represented in the sample. While methods are available to compensate for housing unit undercoverage [58
], they will add to the cost of the study.
Understanding these implications of prediction error allow us to better understand under what conditions the current model should be used in the field. Given the favorable household coverage enabled by the building detection model in rural areas within our study, there is support for it being an effective option for developing household lists in low building dense areas. This is especially true when considering the lack of high-quality existing frames in many LMICs. However, due to concerns of potential bias introduced by undercoverage, the presented building detection model may be better utilized as a supporting approach in urban areas where there is greater variability in differences between households and predicted buildings. As modeling approaches advance and quality annotated datasets become more widely available, we expect to see these methods become increasingly useful.
In addition to developing household lists, building detection models can provide research teams with other valuable options for conducting high-quality household surveys. Though it is not a focus of this study, predicted building counts could also be used as a measure of size for probability proportional to size (PPS) sampling [59
]. PPS is a sampling technique that selects units in one sampling stage with probabilities proportional to a measure of size, followed by the sampling of a fixed number of units at the next stage. The larger the unit’s size, the greater its chance of being included in the sample. The advantage of PPS is that it leads to equal overall sampling probability, while at the same time maintaining a uniform work load for each unit in the first stage. Additionally, for sampling designs in which a full enumeration of households is conducted at the final stage of sampling, calculating predicted building counts prior to data collection could provide a valuable quality check when the actual household counts deviate greatly from the predicted building counts, empowering survey managers to solicit context from field teams to better understand why differences occur.
While this study only assesses object detection models trained on imagery from one state in Nigeria, there is potential for the same or similar methods to be scaled to larger areas to develop regional or national household SDG indicators. One potential bottleneck to implementing these models in new areas is the large amount of labeled training data required to train convolutional neural networks from scratch. To address this in our study, we use transfer learning to initialize our building detection model weights prior to model training. Tiecke et al. [30
] take a different approach, reducing the labeling problem to a binary classification task (labelling imagery of 30 × 30 m area grids as “containing buildings” or “not containing buildings”), which is easier to obtain labels for. They then use these labeled data to train a weakly-supervised semantic segmentation model to predict pixel-level building labels.
Besides training data, a lack of computational resources in LMICs may also hamper usage of these models in practice. While cloud computing options for even specialized Graphics Processing Unit (GPU) servers are becoming increasingly accessible and affordable, reducing the areas required for household listing can also help lessen the computational load of creating large-scale building predictions. Using traditional image processing models that do not require training data, such as conventional edge detectors, can help reduce the number of regions needed to be screened and modeled [30
]. Additionally, if incorporated into a clustered sampling design, only selected enumeration areas would require household enumeration instead of requiring a comprehensive national household listing.
There are some limitations of the study. First, we do not have records that directly link buildings and households and thus were unable to build models that detect residential structures specifically. Given the existing limitations on the availability of household data in many LMICs, this information would likely need to come from existing household surveys that use mobile devices, tablets, etc., to record the location of household units during interviews. Second, there may be error in the manual labeling of building outlines for the training and test sets, as well as error introduced during data collection with respect to household coverage. Labelling for different types of residential structures (apartments, single family homes, etc.) should also help better characterize the heterogeneity in the number of households per building. This may require recruiting labelers with in-country knowledge of various building types, perhaps also using higher resolution imagery than was assessed in this study. Additionally, this methodology assumes that available satellite data provides an up-to-date portrait for the buildings and households that will be present during the survey data collection period. While it is not uncommon in the literature to have a modest temporal gap between satellite imagery dates and the survey data collection period for assessing these classes of models [30
], recognizing the potential for errors can help statistical agencies be proactive in identifying emerging issues. To a certain extent, unmanned aerial vehicles (UAVs) could provide more detailed and timely imagery to help mitigate this concern. While our findings suggest that building density was correlated with the model accuracy, future work may benefit from a more exhaustive exploration into what conditions and settings are challenging for current models used in building detection. In particular, an understanding of how building size, geometry, and type might affect model performance would help survey researchers and statistical agencies better assess where there are still opportunities for improvement. Lastly, while we only provide evidence for a single state in Nigeria, we hope this case study encourages further research and resources to examine a larger-scale implementation of these methods for household enumeration in LMICs.