As reported by the United Nations, urban areas currently contain more than 50% of the world’s population. According to the latest estimates, this proportion will reach 60% by 2030 [1
]. In developing countries, high urbanization rates and uncontrolled urban sprawl often lead to challenges such as inefficiency of transport systems, degradation of the environment, growth of informal settlements, and a proportion of the population living in deprived conditions. Availability of accurate and up-to-date information about the current situation of a city could help in defining and setting up adapted urban policies.
Among the set of potential geospatial information related to urban areas, population density and land use are probably the most important to an urban planner [2
]. Unfortunately, they are limited or not available at all in developing countries, as these lag behind the most developed countries in the adoption and use of geographic information systems (GIS) [3
]. This is especially the case for Africa, which faces a critical need of geographic information [5
]. For instance, a study showed that several important geographic datasets were still either unavailable or difficult to access in Africa [7
]. Notwithstanding recent initiatives to alleviate this issue [8
] and a stronger interest towards alternative data, such as volunteered geographic information (VGI) [9
], more progress needs to be made.
In urban areas, land-use information can be mapped at different scales that range from cadastral plots to large neighborhoods. In this study, we chose to work at the street block level, as was the case in previous studies [2
]. The street block, sometimes referred to as a “city block” or “land parcel”, provides sufficient spatial detail to urban planners and have been depicted as the most fundamental and appropriate unit in which to map the urban structure [13
]. Unfortunately, reference street block datasets were not accessible for our case studies, from either the local authorities and national mapping agencies or any other reliable source. We overcame this challenge by developing a semiautomated processing chain for the creation of street block geometries using OpenStreetMap (OSM) data [16
]. OSM is open-data, meaning it can be accessed and used at no cost by anyone and for any purpose, which makes it an alternative source of data when the availability and access to geoinformation is limited. Disparaged during its early stages of development, the quality of OSM data has been improving rapidly, both in terms of completeness and of thematic accuracy. For that reason, it could become a key player in the coming decade for production and access to high-quality geoinformation in developing countries. As an example, a recent study proved the potential of OSM data to be used for increasing the thematic level of land-use/land-cover maps where there is a lack of official data [17
To the best of our knowledge, few works [18
] have proposed a methodology for the creation of street block geometries using OSM data. Long and Liu [18
] proposed a method to automatically identify “land parcels” from OSM roads. They operated in the Chinese geographic context and developed a framework to address outdated, inexistent, or unavailable reference data. Their approach consists of using geometric operations to clear up the road network. Subsequently, land parcels are automatically created and defined as the remaining space when buffered roads are removed. Their approach proved to be a good approximation of the results obtain from conventional methods but suffered from incompleteness of the OSM road network, leading to the creation of large parcels in smaller cities. Their framework was used recently in other studies [20
]. However, Long et al. [18
] and Fan et al. [19
] provided a theoretical framework without a ready-to-use computer code that limited the easy reproduction of their methods.
Studies aiming at mapping urban land use often make use of land-cover and/or ancillary reference geographic datasets, e.g., detailed cadastral datasets, socioeconomic datasets, or datasets that contain the location of urban facilities (schools, hospitals, shops, etc.) [11
]. Despite their great potential for mapping land use at a fine scale, such exhaustive and detailed datasets are rarely available, especially in developing countries. Furthermore, the initial production and the process of keeping them updated are both costly and labor-intensive. Remote sensing solutions can be used as an alternative for creating and updating reliable land-use information on urban areas. The land use can be mapped directly from satellite imagery and/or from land-cover maps.
The latter approach usually relies on the computation of spatial metrics, also named “landscape metrics” [23
]. These metrics have been widely used for the classification and characterization of urban or rural areas. They were first mainly used in the field of landscape ecology [24
] for their ability to characterize landscapes as ecosystems according to the composition and spatial organization of the land cover classes they contain. Their use in urban areas dates back to the 2000s [26
] for studying urban sprawl [27
], urbanization gradient [28
], or land-use changes [29
More broadly, this study is part of two research projects, namely, MAUPP (maupp.ulb.ac.be) and REACT (react.ulb.be), aiming at improving urban population distribution models and urban malaria risk models, respectively. In these projects, the land-use and land-cover information will be used for disaggregating population counts available for administrative units, using dasymetric modeling [30
]. Consequently, emphasis is placed on having sufficient thematic details for residential use to allow for adequate reallocation of population counts and modeling of population density at the intraurban level. These projects focus on sub-Saharan African cities, which implies the development of solutions that consider the scarcity of ancillary reference data.
The present research proposes a complete, mostly automated, framework for mapping land use at the street block level, using only very-high resolution (VHR) land-cover maps and remote-sensing-derived data. It includes the extraction of the street blocks from OSM and their subsequent characterization using spatial, spectral, and morphological metrics, a feature selection step for discarding highly correlated and redundant information and supervised classification using fandom forest.
This research deploys great efforts for research reproducibility and open access to data and products. Consequently, implemented computer codes and resulting datasets are made available at no cost to any interested users (see Appendix B
The solution proposed in this paper proved to be operational for processing very large areas, as our case studies datasets cover more than 1000 km2 in total, with a spatial resolution of 0.5 m. However, some limitations can be highlighted.
The first limitation relates to the completeness of OSM data. A quantitative evaluation of the geometric and semantic quality of the street blocks is out of the scope of this article, but some aspects can be discussed. A qualitative visual assessment shows that the consistency is more evident in the core urban areas, where the street network is denser and OSM data generally more complete. From several tests that were carried out, we concluded that the resulting street blocks may not be as detailed as expected, e.g., presence of polygons that are too large and encompass multiple distinct land uses. This is mostly related to the fact that the OSM database is not complete enough for certain locations, especially in peri-urban areas. To solve this issue, time was dedicated to the digitization of additional map features in OSM (e.g., roads, tracks, natural elements, etc.) at the periphery of our AOIs (peri-urban areas) to meet our requirements. This also contributed to the completion of the OSM database, which is a positive outcome. Since the OSM data completeness is increasing, it is likely that such issues will become less prevalent in the future. However, the performance of the proposed framework is likely to decrease as the landscape becomes more rural. Further research could look for other strategies for the automated extraction of meaningful landscape units for mapping the land use in rural and peri-urban areas.
The second limitation is linked to the spatial metrics. The selection of relevant spatial metrics for the phenomenon under investigation and the interpretation of their behaviors can be a challenging task in itself [40
]. Moreover, it is likely that some metrics that perform well in one case study are less discriminant for another. It was the case in our results and this could be interpreted because of differences in terms of urban landscapes. As a solution, computing many metrics and feeding them into a feature selection procedure allows for the unsupervised selection of a parsimonious set of features.
Thirdly, the labelling procedure for creating the training and validation sets may clearly be a bottleneck if automation is mandatory. Further research could explore the possibility of taking advantage of the OSM database for the automatic selection and labeling of these samples, as OSM contains some information on land use and Point of Interest (POI).
Next, future studies aiming at implementing the same kind of workflow that we present here should consider the possibility of improving efficiency by computing the metrics for the street blocks belonging to the training samples only. Since they are sufficient for performing the feature selection step, this would save processing time and storage space [61
]. Only the most discriminant features could then be computed for the whole AOI. This approach would allow for computing a very large number of features without creating computational and storage issues.
Finally, as previously mentioned (see Section 2.4.1
), the “patch mosaic” paradigm hides some aspects of the urban structures, which is likely to limit the ability of spatial metrics to adequately characterize urban land use. Possible future work should investigate a broader workflow that would include explicit information derived from the OBIA segmentation process. For example, information on individual segments could be computed, e.g., area, compactness, and fractal dimension, and then summarized either at the class or at the landscape level.
Prediction errors and the corollary uncertainty of the produced maps are important points that any classification framework should consider. In this study, we used the class-probability output from the RF model to identify street blocks for which the prediction was affected by an important level of uncertainty. In addition to the land-use maps where labels correspond to the most probable class, we also provide the class-probability values for each street block. This information is useful especially when classification products are used as input data to other classification or modeling tasks since it is well known that errors propagate to the derived products. In the future, we plan to carry out sensitivity analysis to assess how errors and uncertainty of land-cover maps affect the derived land use and the models of spatial distribution of population densities.