Urban form organizes people, space and flows. As such, urban areas are simultaneously shaped by economic and demographic processes; social relations; legal and political systems; and historical, cultural and climate contexts; etc. [1
]. The urbanization process affects dwellers in many dimensions. For example, one impact concerns cities, where air pollution and its impact on health, inequality and environmental degradation are increasing threats as a consequence of rapid growth [3
]. The development of urban areas is not only conditioned by manifold local and regional factors but also by global trends that contain drivers and consequences. Earth Observation (EO) provides the tools to remotely capture resulting urban expansion and allows the characterization of urban environments spatially across time at different scales. It allows the measurement from coarse to fine patterns of urban form and dynamics in a consistent way [4
Identifying social, economic and environmental underlying processes of urbanization and land-use/land-cover (LULC) changes improves our understanding of cause–effect relationships and helps in the development of strategies for sustainable development [5
]. Socio-economic factors and land-use planning play an important role in determining human behavior (e.g., mobility and leisure), resilience, and the risk of diseases, among other factors, which have a great impact on human well-being. For example, the prevalence of non-communicable diseases, such as those related to physical health, dietary habits or alcohol consumption, has been related to the socio-economic status of the population [6
]; in addition, the availability of accessible green spaces has been associated with a reduction of the risk of cardiovascular and respiratory diseases [8
]; meanwhile, habitat loss and fragmented landscapes increase the probability of the emergence of infectious diseases in humans [9
In recent years, the number of studies quantifying the relationships between EO-derived data and socio-economic variables has risen. Consequently, various elements of the built and natural environment, as well as atmospheric parameters derived from EO, have been related to different socio-economic indicators: For example, image-derived metrics and features have been used to model poverty levels. For example, severe poverty was associated with the travel time to major market towns, and the percentage of woodland and winter crop cover [12
]. Duque et al. [13
] developed a composite poverty index based upon a wide set of variables related to land cover composition and urban spatial patterns. Poverty was found to be higher in areas with less impervious surfaces with the absence of clay roofs, a higher complexity of the urban fabric, and a lower diversity of landscapes [13
]. Similarly, deprived living conditions in major UK cities were related with population density, vast portions of unbuilt land, regular street patterns and cul-de-sacs [14
]. Meanwhile, a local study in Liverpool, UK found that the percentage of vegetation and water, and the variability and homogeneity of the image intensity values were the best predictors of deprivation [15
]. GDP exhibits a high correlation not only with built-up density in a set of Canadian cities [16
] but also with the intensity and density of night-time lights in a city of China [17
]. On the other hand, urban green spaces have been related to health and well-being. In general, the percentage and proximity of greenness in the living environment have a positive relationship with physical and mental health, and with a decrease in surface temperatures [18
]. Regarding air quality, it has been related to both the built and natural environments. Continuous urban development was associated with better air quality in urban areas of the USA, while the presence of proximate forest was significantly related to an improvement in air quality when demographic factors and the degree of urbanization were controlled for [19
]. Generally, a low centrality of the urban fabric, a low density, worse transport services and limited land diversity are correlated with higher pollutant concentrations [20
A general finding from these studies is that the built-up structure, night-light emissions, transport network, population distribution and LULC configuration and diversity are related to socio-economic-ecological factors in urban areas. Such relationships have been mainly analyzed based on correlations, multiple regression and random forest methods; they proved to be techniques suitable for modeling statistical variables by means of EO-derived data. However, the majority of studies are intra-urban analyses conducted at the city level, with only few at the regional or national levels. A minority are based on global inter-urban analyses, which provide a more comprehensive, but less detailed, picture of development patterns. Examples of inter-urban studies demonstrated that, in European cities, an equal distribution of LULC is associated with lower inequality in life satisfaction [21
] and that quality-of-life-related indicators can be modeled by means of LULC spatial metrics [22
]. In urban areas of the USA, similarities in the structures of urban landscapes were linked to transport behaviors [23
On balance, relationships between the built and natural environments and socio-economic-ecological factors have been proven, but large area and multi-temporal analyses remain rare. These analyses bring the opportunity to create, based on predetermined relationships, spatial indicators of social, economic and environmental parameters among and across countries. In this direction, geospatial data have been used as proxies of income inequality [24
], unsustainable urban growth [23
], economic disparities [26
] and GDP, especially useful in countries with low-quality statistical systems [27
]. Hence, unraveling the links between urban form and LULC and statistical variables, both at a particular moment in time and in terms of their evolution over time, aids in mapping and assessing the temporal evolution of socio-economic and ecological processes. Some examples in this regard are foreseeing the loss of farmland and food security issues [28
], predict the risk of and exposure to diseases [10
] or comparing the evolution of socio-economic factors, such as employment and poverty, in response to specific policies [29
There has been a recent call regarding the need for cross-comparative empirical analyses across different regions that reveal the consistency of these relationships and that allow the drawing of reliable conclusions on the sustainability of urban development [2
]. However, these analyses are usually limited by the scarce or inconsistent availability of data at a global scale. For the needed socio-economic datasets, currently, the availability of global and still comparable data at resolutions of intra-urban scale is still limited. On the one hand, some institutions are delivering socio-economic and environmental statistics for cities and functional urban areas. Two examples are the City Statistics from Eurostat [33
] and the Organization for Economic Co-operation and Development (OECD) [34
]. They provide comparable statistics associated with territorial units with large-scale coverage for multiple time periods. On the other hand, there has been a growing interest in integrating statistical and spatial information to produce spatially explicit socio-economic data, swapping from irregularly shaped boundaries to a regular surface, easing comparisons within and across regions at lower levels. Two of these initiatives are GEOSTAT [35
] and the Socioeconomic Data and Applications Center (SEDAC) [36
]. Although the variables and the time coverage are still limited, they are promising data sources that are under development. For the needed spatial datasets, concurrently, recent EO-based efforts have been made in the global mapping and characterization of human settlements and land covers over time. Some examples are the Global Urban Footprint (GUF), which is a worldwide map of urban settlements with an unprecedented spatial resolution of 12 m for the years 2010–2013 [37
]; the Global Human Settlement Layer (GHSL), which represents human presence in the past (1975, 1990, 2000 and 2014) with a spatial resolution of 30 m [38
]; the Atlas of urban expansion, which collects data on urban expansion from a global sample of 200 metropolitan areas [39
]; and the GlobeLand30 [40
] and the Climate Change Initiative (CCI) [41
], which provide global land cover data at spatial and temporal resolutions of 30 m (2000 and 2010) and 300 m (from 1992 to 2018), respectively. Furthermore, the development of methods and algorithms to automatically classify urban environments across the globe is progressing rapidly e.g., [42
]. The global coverage and high spatial and temporal resolutions of EO-derived products combined with the high capacity to automatize processes allows the frequent updating of geospatial datasets. This, however, is still an issue in socio-economic databases, since they depend on surveys and censuses with low temporal frequency, and they are limited or even inexistent in some geographic areas.
Accordingly, our aim is to use spatial patterns and their development over time as proxies of socio-economic parameters at the global level. With the help of easily quantifiable spatial metrics extracted from openly available EO-derived and ancillary data, we aim to prove the feasibility. With the growing availability of spatial and socio-economic datasets, this is an opportunity in terms of methodological fine-tuning for defining empirical methods that could be applied globally in the near future, when higher-resolution data with a global reach will be available. In this context, a semi-global analysis will bring the opportunity to obtain first fundamental conclusions and foresee potential subsequent analyses when more and higher resolved (i.e., spatially, temporally, thematically, and better quality) data become available. Therefore, the purpose of this study is to quantify the relationships between socio-economic and environmental variables, such as income, inequality, GDP, air quality and employment, and spatio-temporal metrics issued from geospatial databases, both on a specific date and in terms of their variation over time. Subsequently, the purpose is to identify the spatio-temporal metrics that are most related to socio-economic and environmental variables and can be extracted on a massive scale from current global geospatial databases.
The combination of multi-source and multi-temporal datasets for almost six hundred functional urban areas across 32 countries led us to extract insights into the relationship between urban spatial patterns and socio-economic and environmental variables at a semi-global scale. By means of a machine learning algorithm, random forest regression, we were able to partially model some socio-economic variables and their change using spatio-temporal metrics extracted from geospatial databases. We explained between 68% and 44% of the variability of the income, Gini, GDP per capita and air quality variables with the sole use of spatial information. This central result proves that the spatial appearance of urban areas and their change are related to the socio-economic and environmental indicators for these areas.
We are aware that we have neither considered macro-economic or other overarching global developments nor considered intra-urban variabilities, but still, we can conclude that these relationships exist. With regard to their variations, we analyzed the relationships with the metrics for only two of them (i.e., air quality and employment rate), since many variables were not available for two dates (such as income or Gini) or the change over time is not a good indicator of development, as is the case for GDP [46
]. Nevertheless, we explained 41% and 32% of the variation in the air quality and employment rate, respectively, which suggests that the spatial component may relate partially to how these indicators change. Overall, however, we found that there are fundamental correlations between the spatial urban structure and socio-economic-ecological variables. Multi-temporal changes, however, cannot be estimated one-to-one from this correlation, since, for example, spatial urban structures are subject to certain inertia in contrast to economic developments.
The use of random forest regression has strengths and weaknesses. Its interpretability compared to that of parametric regression is reduced since the function is unknown. However, with the variable importance measure, it is possible to identify those independent variables that have strong influence in the model [15
], the ones with partial influence, and the ones adding noise or uncertainty.
We investigated the relationship between socio-economic, environmental and spatial variables and found evidence of their links. The compactness degree of built-up areas and their cores is highly associated with the average income in FUAs. In particular, more compact values are found in lower-income FUAs, while there are higher incomes in less compact, and thus more scattered, urban configurations (Figure 6
c). This assumption might be influenced by independent differences in compactness and income across countries. However, we found a similar negative correlation between income and compactness in the FUAs from the USA (Pearson’s r
= −42%), for instance, which shows that this trend is not only determined by geographic or cultural aspects. Salvati and Carlucci [63
] found that discontinuous settlements in Northern Italy (low compactness) had higher disposable incomes, and related the phenomenon to suburbanization processes typical in the developed and economically active regions of Europe. Besides, we measured nonlinearities, where a higher loss of agricultural land between 2000 and 2014, higher fragmentation of built-up areas and sprawl (more urban expansion than population growth) occurred in middle-income FUAs, while low- and high- income FUAs had built-up areas that were more spatially centralized and populations that outpaced built-up growth (Figure 6
c). Cities in countries with higher incomes have been previously related to higher levels of land consumption and urban fragmentation [64
]; however, this study disregarded income variation within cities from the same country. Income inequality, here measured with the Gini, was lower in the FUAs with compact urban cores that at the same time presented dispersed and more spatially homogeneous built-up areas (Figure 6
b). These FUAs experienced higher densification and accessibility with urban growth between 2000 and 2014, which means more infilling and expansive urban growth closer to the road network. While the density of agricultural land was higher, they also lost higher proportions than more unequal FUAs in terms of income. In this sense, Boulant et al. [65
] claimed that the Gini was higher in larger cities, which usually provide more opportunities to dwellers but, in return, widen income inequalities. Meanwhile, Angel et al. [64
] related cities in countries with higher income inequalities to urban sprawl, in terms of lower population densities. Nevertheless, we did not find a significant relation between the Gini and PUGI index (which also accounts for sprawl). The GDP per capita was higher in less compact built-up shapes that experienced an increase in urban density between 2000 and 2014 (Figure 6
a). This trend was also found by Weilenmann et al. [66
], where wealth was positively related to higher urban densities and higher degrees of dispersion. We identified lower GDPs in compact FUAs that experienced dispersed growth with more population growth than built-up expansion between 2000 and 2014 (Figure 6
a). However, we found the positive correlation between GDP and the degree of urban centrality within Mexican FUAs not observed at the global level. Huang et al. [67
] also found a negative relationship between GDP per capita and compactness, stating that wealth brings more private motor vehicles and highways, which, in developed countries, contributes to the facilitation of life in outlying suburban areas; meanwhile, the lower motorization in developing countries results in more compact urban forms, as dwellers live close to their working places, usually in the inner city.
In the environmental dimension, air quality was better in FUAs with lower densities of agricultural land but higher densities of low semi-natural/natural vegetation land and water bodies (Figure 6
d). We also found a relationship between the pollution in the FUAs and compact shapes, both from the urban footprint and the urban core. The analysis of the compact shape of urban footprints has been proposed as a valuable indicator—besides population density, land-use mix, connectivity and accessibility—to be monitored in order to mitigate climate change. Angel et al. [68
] claimed that, other factors being equal, compact shapes reduce energy use and gas emissions. On the contrary, Bechle et al. [69
] did not find a significant correlation between compactness and NO2
concentration, but they did find such with leapfrog development and higher population densities. Regarding the change in air quality, more compact FUAs improved their air quality between 2000 and 2014, together with an increase in accessibility and a higher consumption of agricultural land as a consequence of urban growth (Figure 6
e). Last, concerning the employment rate change, positive rates were found in FUAs with compact urban cores, a denser urban growth (i.e., infilling and expansive growth types) and an improvement in accessibility (Figure 6
f). This seems contradictory to the negative relationship between income and GDP, and built-up and urban core compactness; this may have a two-fold explanation: first, the subset of FUAs in the employment model does not represent the same geographic regions as in the GDP or income models (Table S1
); second, the OECD defines the employment rate as the ratio of the employed population over the working age population [49
], therefore, an increase in employment accompanied by a higher increase in the population of working age will result in a negative change. The employment rate model associated a higher drop in the employment rate with a higher density of low vegetation land together with greater consumption of low vegetation land due to urbanization between 2000 and 2014. Changes in employment have been previously related to LULC change in Portugal, where changes in land uses had a direct impact on labor [70
]. In summary, we determined that built-up and urban core compactness are the most influential metrics for all the socio-economic variables analyzed, which has also been previously noticed by other authors [68
This analysis does not account for causality and should be interpreted cautiously; nonetheless, it helped to disentangle some relationships between the spatial patterns of functional urban areas and socio-economic indicators. Besides, the findings presented cannot be generalized to regions not covered in the analysis. The majority of the FUAs analyzed were chosen due to data availability in developed or high-income countries. Thus, we cannot assume the same relationships in developing or low-income countries until new models with more datasets are tested. In this sense, this study is a first step in exploring these global relationships and sub-models in certain regions.
In addition, some limitations should be considered when working with multi-temporal and global datasets. For example, the historical and cultural path dependencies of urban areas influence particular urban structures and land cover compositions. These influences should be considered when interpreting results at the global level. For instance, what might be considered a compact pattern in the USA versus Europe, and in high-income versus low-income countries or across continents, can be fundamentally different. Spatio-temporal metrics may have reflected those differences indirectly by means of the measured spatial patterns. Therefore, in future research, the inclusion of a categorical variable that groups FUAs with similar path dependencies or geographic-cultural contexts would be worth exploring.
On the other hand, the quality of the data is a crucial matter in this type of analysis. For instance, the GHSL used to describe the built-up areas had a balanced accuracy of 86% [72
], which probably had an influence on the relationships found that remains unknown; however, with the interpretation of spatio-temporal metrics, we identified outliers that led to the detection of FUAs with classification errors, which were removed from the analysis, reducing the inclusion of potential errors in the models (Table S1
). In this direction, the use of spatio-temporal metrics linked to a boundary could be used to identify areas with anomalies and, therefore, potential errors in the GHSL database. In the realm of the OECD metropolitan areas dataset, it is still a challenge to model the variation over time, since multi-temporal data availability drastically decreases, and, when available, the data accumulate possible errors that variables might have for the two individual dates. Since different methods are applied to gather socio-economic data at the FUA level, such as aggregation or disaggregation from lower and higher levels, the reliability widely depends on the accuracy of these methods; thus, socio-economic variables are prone to uncertainties that we cannot quantify. It should be noted that the statistical data used in this study refer to data available in February 2020. After this date, OECD data are expected to be regularly updated and new cities, added to the database. However, this does not affect the proposed analysis, and the method still remains valid. Both statistical and geospatial open databases are dynamic, constantly being developed and improved; therefore, continuous changes over time are expected. Besides, statistics sometimes include estimates and assumptions; thus, data produced by different organizations for the same area are not hard facts and might differ, so they should be used with caution. However, since we compare data from the same database, we may assume that the data are consistent and the comparisons, solid. The analysis was restricted by the availability of statistical variables and geospatial data, but the inclusion of additional environmental variables, more suitable economic and social variables (e.g., employment and GDP) at the metropolitan level, and additional geoinformation would be interesting to explore. Finally, the spatial boundaries used for extracting the urban spatial patterns of the EU-OECD FUAs rely on a consistent method for delineation; we recognized that due to various reasons such as the differing quality in datasets, the geometrical definition of the boundaries in some countries is not as fine as in others. For instance, Mexico, Chile and Japan showed coarser geometries than the USA or Europe, which might influence the spatio-temporal metrics, as the built-up areas were clipped using these boundaries.
The identification of socio-economic phenomena and their cross-comparison among regions, countries and continents by means of metrics derived from available geospatial databases for urban environments is increasingly feasible. These databases are continuously improving; their updates are becoming more and more frequent since the processes are being automatized and an increasing number of satellites are providing freely available images with global coverage (e.g., the Landsat and Sentinel missions). In the foreseeable future, more comparable data with higher spatial and temporal resolutions will become available. Hence, the use of spatio-temporal metrics—describing urban spatial patterns and growth—linked to socio-economic and environmental indicators, and their change over the time, will help in improving the understanding of the drivers of the development in urban areas and their consequences at the global scale, which has been limited to date. Therefore, the proposed methodology, tested here with current semi-global data, could be extrapolated to a global scale as soon as more data become available. Furthermore, new spatial and socio-economic datasets at different scales should be explored soon, increasing the possibilities of new findings and analyses. Our preliminary outcomes show that there are common drivers and consequences of urban development within and across regions (e.g., the compactness of the built-up footprint influences or is related to household income, income inequality or GDP per capita in functional urban areas), indicating global trends. However, intra-urban variations should not be disregarded, since the high heterogeneity in terms of urban patterns and socio-economic factors existent within urban areas needs to be considered [2
]. A future study should not only increase the geographical extent of the analysis but also include intra-urban variations as well as sensitivity analyses with varying spatial units.