On September 19, 2017, a 7.1 magnitude (M) earthquake occurred in Puebla, Mexico [1
]. Mexico City and the states of Puebla and Morelos sustained significant damage to the infrastructure and population due to the densely populated region [1
]. The United States Agency for International Development (USAID) and the Pan American Health Organization (PAHO) estimated that at least 43,000 buildings were destroyed or sustained significant damage, 6100 people were injured, 366 people were killed, and hundreds missing [2
]. The most impacted areas were in the rural parts of the region, making it difficult for response agencies to identify and respond to communities with the most need. The scale of the destruction was significant and distributed geographically, with thousands of residents left homeless and others sleeping on the streets in fear of aftershocks [2
Following large-scale humanitarian disasters, challenges emerge regarding prioritizing emergency response and aid provisions. With natural disasters, there is generally a large displaced population seeking shelter and in need of assistance. It is often instinctive for people to flee the affected area after some time; however, this can complicate relief efforts and increase the chance of mortalities [3
]. Traditional methodologies like witnesses’ interviews or satellite imagery are commonly used among relief organizations to determine the post-destruction and estimate the number of mortalities or missing people [3
]. However, they are often slow, potentially biased, and unreliable [3
With the occurrence of natural disasters increasing over time, new spatial and geostatistical perspectives have been adopted. Earthquake risk and damage modeling have specifically seen a large growth. In 2010, Sahar, Muthukumar, and French used geographic information systems (GIS) and algorithms to extract two-dimensional building shapes from aerial imagery for better earthquake risk assessment modeling [4
]. Feng et al. showed the combined use of remote sensing with GIS building data to detect the three-dimensional destruction of buildings and estimate the number of potential casualties [5
]. Newer efforts have shown the ability to preemptively assess expected damage in urban areas using earthquake building codes, which can lead to more efficiently placed assistance camps or evacuation routes [6
]. Recently, there has been a shift towards more geo-computational approaches. Hossain et al. leveraged smart watch data and GIS technologies to algorithmically capture the heart rate of the earthquake victims in order to identify critical areas for search and rescue [7
]. Machine learning and neural network approaches are being tested to improve previous earthquake prediction models [8
]. Other approaches have incorporated crowd-sourced data, such as emergency volunteer mapping and social media geotags to detect where the largest damage has occurred in humanitarian emergencies or constellations of small satellites to detect significantly damaged villages on a daily basis [9
]. The National Geospatial Agency is attempting to automate the extraction of areas in need of humanitarian assistance by leveraging artificial intelligence on high-resolution imagery [12
]. All of these methods use overarching rather than personal location data to provide insight into humanitarian disasters.
The literature shows that call detail records (CDRs) have been useful in measuring broad spatial population characteristics and migration in the private and public sectors. Using triangulation techniques, CDR data can produce a geolocation at the time a call or text is made and evaluate a population mobility and social network, sometimes down to an individual scale [13
]. Telecommunication companies are constantly analyzing CDRs to monitor their market penetration and economic success. Researchers continue to explore how governments and private organizations can benefit from more timely population and migration estimates by using CDRs, especially in developing nations where such knowledge can inform policy, but the costs of collecting data may be significant. For example, Salat, Smoreda, and Schläpfer developed methods for extrapolating population densities from CDRs from tracing weekly, monthly, and yearly patterns of mobile phone use in Senegal [14
]. Zufiria et al. also utilized mobile phone data from Senegal and found that aggregated mobility profiles based on likely livelihoods can shed light on economic activity, agricultural cycles, and precipitation, and thus, seasonal migration [15
]. Lai et al. further directly applied CDR data and found that it can complement national statistics, especially in countries with high rates of internal migration, to ensure that public services are appropriately deployed [16
]. Cell phone data is effective in detecting and depicting established patterns of life within a country.
Call detail records have also been extensively employed to complement various organizations’ understandings of population movements when these patterns of life are disturbed in the wake of specific environmental and epidemiological crises. In a particularly relevant example, Bengtsson et al. (2011) analyzed locational data from CDRs to find that 630,000 people left Port-au-Prince within a 19-day period after the 2010 Haiti earthquake [3
]. Similarly, Wilson et al. (2016) utilized deidentified mobile CDRs and algorithmic transition matrices to identify population flows in and out of Kathmandu Valley in the first few weeks after the 2015 Gorkha earthquake in Nepal [17
]. Pastor-Escuredo et al. showed the viability in using CDRs to characterize impacts of flooding in Tabasco, Mexico in 2009 [18
]. Andrade et al. demonstrated in their analysis of a 2016 earthquake centered in Manabi, Ecuador that using aggregated activity through call towers can protect individual users from privacy concerns while assessing the extent of urban infrastructure damage and providing insight into patterns of mobility depending on the user’s proximity to the epicenter of the earthquake [19
]. Horanont et al. (2013) used 9.2 billion location records, derived from an auto-global navigation satellite system (GNSS) service from a telecommunications company in Japan, to analyze human mobility patterns after the 2011 Great Japan Earthquake [20
]. CDR data has also been utilized in applications of epidemics and disease control. Peak et al. (2018) used CDRs to algorithmically investigate the decrease of travel during the Ebola epidemic intervention in Sierra Leone [21
]. These studies were able to detect the mobility and behavior patterns of the population after a large-scale natural disaster, showing a promising method to prepare for damage assessments and post-disaster responses.
Another type of telecommunications data that can be used in similar applications to CDRs is personal electronic device (PED) data. With the expansion of mobile phone accessibility in the global South, PED data has been leveraged to assist in the challenges faced in large-scale humanitarian emergencies. The use of PED data has developed in parallel with CDR data and is gathered instead from global navigation satellite systems (GNSSs) in smart devices (i.e., cell phones, smart phones, smart watches, smart tablets, etc.) [22
]. PED data differs from the previously used CDR data, as it collects information about a location without depending on the transmission of communications (calls and/or texts) and, therefore, yields a higher level of accuracy of location precision. Yabe et al. (2019) utilized PED data of one million users following the Kumamoto earthquake to estimate the evacuation rates relative to seismic intensities [23
]. Chen et al. (2020) utilized PED data from Baidu Map to track urban flow changes in Shenzhen during Typhoon Mangkhut [24
]. More recently, with the COVID-19 pandemic, PED data is being used to track the mobility, transmission, and success of social-distancing guidelines. Liautaud, Huybers, and Santillana (2020) leveraged PED data to analyze the decrease in mobility with fever incidences from thermometers connected to smartphones [25
]. It confirmed that social distancing has reduced transmission of the virus and could help identify potential outbreaks in the future.
PED data provides extremely rich spatiotemporal data on human mobility and can be used in many multidisciplinary applications, such as natural disasters, public health, credit fraud, human rights violations, etc. [26
]. Companies like LocationSmart (www.locationsmart.com
), Foursquare (www.foursquare.com
), or Cuebiq (www.cuebiq.com
) sell offline location analytics for businesses to provide consumer insights and marketing. Organizations like UNICEF and the World Bank also leverage this locational data for real-time humanitarian responses [27
] (Figure 1
This study details an approach using pseudonymization PED data to detect locality depopulation from the 2017 large-scale earthquake in Puebla, Mexico. The data was preprocessed using Python algorithms and loaded into a PostgreSQL database. The approach then used a system of algorithms to detect when the residents left the locality. The algorithms first identified the residents of the localities, compared the number of residents each day, and then analyzed how the population average changed over time to detect communities that were depopulating. By using communities close to the earthquake’s epicenter, as well as similarly sized communities away from the epicenter, this approach accurately showed that communities near the earthquake rapidly depopulated as a result of the earthquake. Currently, there is limited research on the use of PED data tracking mobility during a natural disaster. However, this study seeks to address the gap that exists and describes an approach that provides humanitarian response organizations an affordable, accurate, and automated approach to detect which communities are impacted the hardest by a large-scale humanitarian disaster. Such an approach could find would likely be more valuable in areas that lack other means of reporting (lack of capacity) or where the local government/authorities do not want to share information with the international community (lack of transparency).
This approach resulted in an overall accuracy of 73% in detecting the depopulation of localities following the earthquake. Out of the 15 control localities, 12 were detected as not depopulated or an insignificant decrease in residents after the earthquake, yielding a commission error of 20% (Table A3
). Of the 15 experimental localities, 10 were detected with a decrease greater than 24% or more of the population after the earthquake, yielding an omission error of 33% (Table A3
All locality candidates for the controls were randomly selected from 21 out of the 32 states in Mexico; at least two localities from each state in between the population sizes of 1000 - 6000 people were chosen. All 15 experimental localities were in the state of Puebla, geographically proximate to the epicenter of the earthquake (Figure 9
). There appeared to be no spatial correlation between the false negatives or experimental localities that failed to alert. The five experimental localities that did not alert were in the same geographic proximity to the 10 experimental localities that correctly alerted. Six out of the 15 of the control localities were from the state of Mexico, two from Hidalgo, two from Yucatan, two from Guerrero, one from Michoacán, one from Chiapias, and one from Colima. Additionally, while there appeared to be no definitive spatial correlation of the controls, most false positives were in states near the central region of the country, west of the earthquake.
The overall population of a locality did not correlate with the accuracy for the control or experimental localities. For example, control locality San Miguel Ixtapan had a population of 1251 people with a slight decrease of 1.36% of residents, whereas control locality Huamuxtitlán had a population of 6063 but an increase of 3.15% (Table A2
). Similarly, experimental locality Domingas Arenas had a population of 5864 people and a decrease of 47.86% of residents after the earthquake, but experimental locality San Felix Hidalgo had a population of 1628 people and a decrease of 53.73% (Table A3
Lastly, of the experimental localities, there was not a strong correlation between earthquake zone severity and the accuracy of the algorithm. Of the seven localities located in Zone VI, two did not alert, while of the eight localities in Zone VII, three did not alert (Figure 4
) (Table A4
) (Table A5
This approach leveraged Cuebiq pseudonymized PED data to identify and analyze patterns of movement after a natural disaster. The methods used in this study provided a workflow that can be potentially used as a framework for other natural disasters or similar incidences (such as violence) that prompt migration. The spatial distribution and market penetration of the 2017 data posed a challenge when working with the Cuebiq PED data; it was difficult to strike a balance between examining localities with a tighter, more reliable geofence and ones that had enough residents to analyze longitudinally. The 15 control and 15 experimental localities could only be selected after examining 82 control and 38 experimental candidates and checking their market penetrations to see if their impacts could be measured and analyzed. Working with smaller, more rural localities (generally with a population less than 2000) meant using smaller buffers with greater specificity in identifying residents—false positives were less likely to be seen from individuals moving through the area while they commuted to work, etc. However, there were fewer residents in those localities overall, because there was low cell phone penetration in rural areas. This will likely not be as pressing of an issue when replicating the experiment on more recent data, since PED penetration has shown to consistently increase over time; in 2019, Mexico’s smartphone penetration increased 10 percentage points in three years to capture 49.5% of the population [32
There are also data collection issues presented in this approach. If a disaster is severe, it will likely destroy the telecom and power networks. Since current PEDs rely on the terrestrial telecom network, they will have no way to upload their information. With the advent of low-Earth constellations of communication satellites, like SpaceX’s Starlink (starlink.com), this could be alleviated, but the devices will still lose their charge without a power network within a few days. This could be addressed with dispersed power sources such as diesel generators, solar panels, or small-scale wind generators, but these are not widely available in some less-developed areas.
Another drawback of this study is the temporal distribution of the data. Since the data spanned a little more than a month—15 days before the earthquake and 21 days after the earthquake—using a dataset with larger temporal range will likely improve the results of the pattern of life analysis used to correctly identify the residents, including those who may be traveling, working temporarily elsewhere, or only using their smartphones intermittently. Using the approach described here, an average of only 2% of a locality’s 2010 population were tracked as residents [31
After some consideration, future elements of this research could include, but are not limited to, examining the geological and geographic factors that could have contributed to the lack of market penetration, analyzing the usage of different types of smart device applications that are partnered with Cuebiq and are commonly used in Mexico, understanding the aggregate demographics of the region in correlation to smartphone usage, and other situational circumstances that affected the population’s mobility but were not previously identified. These reasons were the top hypotheses as for why the algorithm performed better on some localities than others, and further investigation could give potential insight on the lack of market penetration and more accurate adjustments for the algorithm.
The approach presented here could be extended to the present day, monitoring currently intact localities in areas at risk of humanitarian disasters. While downloading, loading into the database and preprocessing this month of data in our study region took approximately three days on a Windows 10 server; automating the process to remove the inefficiencies incurred by optimizing the workflow to output parallel datasets differing in experimental parameters could make this process feasible on a weekly basis. During the aftermath of a humanitarian disaster, data could pinpoint localities incurring the greatest damage. Additionally, this approach could be modified to function from a period of a few hours as opposed to a 24-h period to serve as an early warning mechanism of a humanitarian disaster and the mass migration that may result. While there are hurdles, this approach is scalable. A cloud-based effort could easily monitor all localities in an area, such as this one in Mexico, alerting users any time a spatial grouping of localities experiences rapid depopulation.
Organizations interested in this approach could modify the algorithm’s sensitivity to suit their objectives. For example, in situations such as monitoring a specific region at risk of disaster, the −24% slope difference could be modified to −15% percent, reducing the chance that a locality’s depopulation would miss detection. This lowers the omission rate but increases the commission rate. Conversely if an organization is concerned with the total number of people fleeing, the alert level could be changed to 30% to reduce commission errors.
In future studies, building a better baseline of locality patterns of life could account for weekly, monthly, and annual changes and significantly improve the accuracy. This is possible with the significant increase in market penetration of location PED and scalable computing. Additionally, with a longer period of data, a custom buffer could be created for each ID based on their usual movements. When the ID passes out of that buffer, it would be recorded as an abnormal movement. Enough IDs with abnormal movements on a specific day could indicate an anomalous environmental or political event. The advantage that this algorithm has is that it could be used in larger cities for when one neighborhood might be destroyed and the residents shift to a neighboring one. A disadvantage is that a false positive could occur for other reasons, such as large-scale sporting events.
As more years of data become available, a greater number of incidents like humanitarian emergencies will provide “labels” for training artificial intelligence or machine-learning algorithms. These algorithms will more accurately detect anomalous situations of interest, such as the localities most severely affected by a humanitarian disaster. The operationalization of this approach will likely be possible in the near future with increased data collection and availability, improved computer processing, and near-real time. Operationalization will, however, require that the location data from PEDs are transferred, requiring a functioning telecom network or a space-based communication network such as Starlink. Operationalization also requires that PEDs can continue to be powered through either a functioning power grid or, if that is destroyed in a severe disaster, dispersed power sources such as diesel generators, solar panels, or small-scale wind generators.
With increases in computing capacity and the growing database of PED on a global scale, approaches like this have the possibility of providing researchers and practitioners a way to monitor large areas at-risk of humanitarian disaster. It is understood that this approach will never replace on-the-ground witnesses but serve as a low-cost alert system capable of providing additional information in areas lacking connectivity. Such an approach will be even more valuable in areas that lack other means of reporting (lack of capacity) or the local government/authorities do not want to share information with the international community (lack of transparency). Our hope is that this research helps provide organizations committed to providing emergency humanitarian responses a way to act in a decisive manner through informed, data-driven insights and, ultimately, reduce the suffering of those affected by a disaster.