Next Article in Journal
Seeing Climate Change: A Framework for Understanding Visualizations for Climate Adaptation
Next Article in Special Issue
Assessing Earthquake Impacts and Monitoring Resilience of Historic Areas: Methods for GIS Tools
Previous Article in Journal
Urban Water Demand Simulation in Residential and Non-Residential Buildings Based on a CityGML Data Model
Previous Article in Special Issue
Post-Earthquake Recovery Phase Monitoring and Mapping Based on UAS Data
Article

Detecting Destroyed Communities in Remote Areas with Personal Electronic Device Data: A Case Study of the 2017 Puebla Earthquake

Spatial Sciences Institute, University of Southern California, Los Angeles, CA 90089, USA
*
Author to whom correspondence should be addressed.
ISPRS Int. J. Geo-Inf. 2020, 9(11), 643; https://0-doi-org.brum.beds.ac.uk/10.3390/ijgi9110643
Received: 14 September 2020 / Revised: 13 October 2020 / Accepted: 22 October 2020 / Published: 28 October 2020
(This article belongs to the Special Issue Geomatics and Geo-Information in Earthquake Studies)

Abstract

Large-scale humanitarian disasters often disproportionately damage poor communities. This effect is compounded when communities are remote with limited connectivity and response is slow. While humanitarian response organizations are increasingly using a wide range of satellites to detect damaged areas, these images can be delayed days or weeks and may not tell the story of how many or where people are affected. In order to address the need of identifying severely damaged communities due to humanitarian disasters, we present an algorithmic approach to leverage pseudonymization locational data collected from personal cell phones to detect the depopulation of localities severely affected by the 2017 Puebla earthquake in Mexico. This algorithm capitalizes on building a pattern of life for these localities, first establishing which pseudonymous IDs are a resident of the locality and then establishing what percent of those residents leave those localities after the earthquake. Using a study of 15 localities severely damaged and 15 control localities unaffected by the earthquake, this approach successfully identified 73% of severely damaged localities. This individual-focused system provides a promising approach for organizations to understand the size and severity of a humanitarian disaster, detect which localities are most severely damaged, and aid them in prioritizing response and reconstruction efforts.
Keywords: location analytics; humanitarian; earthquake; Mexico location analytics; humanitarian; earthquake; Mexico

1. Introduction

On September 19, 2017, a 7.1 magnitude (M) earthquake occurred in Puebla, Mexico [1]. Mexico City and the states of Puebla and Morelos sustained significant damage to the infrastructure and population due to the densely populated region [1]. The United States Agency for International Development (USAID) and the Pan American Health Organization (PAHO) estimated that at least 43,000 buildings were destroyed or sustained significant damage, 6100 people were injured, 366 people were killed, and hundreds missing [2]. The most impacted areas were in the rural parts of the region, making it difficult for response agencies to identify and respond to communities with the most need. The scale of the destruction was significant and distributed geographically, with thousands of residents left homeless and others sleeping on the streets in fear of aftershocks [2].
Following large-scale humanitarian disasters, challenges emerge regarding prioritizing emergency response and aid provisions. With natural disasters, there is generally a large displaced population seeking shelter and in need of assistance. It is often instinctive for people to flee the affected area after some time; however, this can complicate relief efforts and increase the chance of mortalities [3]. Traditional methodologies like witnesses’ interviews or satellite imagery are commonly used among relief organizations to determine the post-destruction and estimate the number of mortalities or missing people [3]. However, they are often slow, potentially biased, and unreliable [3].
With the occurrence of natural disasters increasing over time, new spatial and geostatistical perspectives have been adopted. Earthquake risk and damage modeling have specifically seen a large growth. In 2010, Sahar, Muthukumar, and French used geographic information systems (GIS) and algorithms to extract two-dimensional building shapes from aerial imagery for better earthquake risk assessment modeling [4]. Feng et al. showed the combined use of remote sensing with GIS building data to detect the three-dimensional destruction of buildings and estimate the number of potential casualties [5]. Newer efforts have shown the ability to preemptively assess expected damage in urban areas using earthquake building codes, which can lead to more efficiently placed assistance camps or evacuation routes [6]. Recently, there has been a shift towards more geo-computational approaches. Hossain et al. leveraged smart watch data and GIS technologies to algorithmically capture the heart rate of the earthquake victims in order to identify critical areas for search and rescue [7]. Machine learning and neural network approaches are being tested to improve previous earthquake prediction models [8]. Other approaches have incorporated crowd-sourced data, such as emergency volunteer mapping and social media geotags to detect where the largest damage has occurred in humanitarian emergencies or constellations of small satellites to detect significantly damaged villages on a daily basis [9,10,11]. The National Geospatial Agency is attempting to automate the extraction of areas in need of humanitarian assistance by leveraging artificial intelligence on high-resolution imagery [12]. All of these methods use overarching rather than personal location data to provide insight into humanitarian disasters.
The literature shows that call detail records (CDRs) have been useful in measuring broad spatial population characteristics and migration in the private and public sectors. Using triangulation techniques, CDR data can produce a geolocation at the time a call or text is made and evaluate a population mobility and social network, sometimes down to an individual scale [13]. Telecommunication companies are constantly analyzing CDRs to monitor their market penetration and economic success. Researchers continue to explore how governments and private organizations can benefit from more timely population and migration estimates by using CDRs, especially in developing nations where such knowledge can inform policy, but the costs of collecting data may be significant. For example, Salat, Smoreda, and Schläpfer developed methods for extrapolating population densities from CDRs from tracing weekly, monthly, and yearly patterns of mobile phone use in Senegal [14]. Zufiria et al. also utilized mobile phone data from Senegal and found that aggregated mobility profiles based on likely livelihoods can shed light on economic activity, agricultural cycles, and precipitation, and thus, seasonal migration [15]. Lai et al. further directly applied CDR data and found that it can complement national statistics, especially in countries with high rates of internal migration, to ensure that public services are appropriately deployed [16]. Cell phone data is effective in detecting and depicting established patterns of life within a country.
Call detail records have also been extensively employed to complement various organizations’ understandings of population movements when these patterns of life are disturbed in the wake of specific environmental and epidemiological crises. In a particularly relevant example, Bengtsson et al. (2011) analyzed locational data from CDRs to find that 630,000 people left Port-au-Prince within a 19-day period after the 2010 Haiti earthquake [3]. Similarly, Wilson et al. (2016) utilized deidentified mobile CDRs and algorithmic transition matrices to identify population flows in and out of Kathmandu Valley in the first few weeks after the 2015 Gorkha earthquake in Nepal [17]. Pastor-Escuredo et al. showed the viability in using CDRs to characterize impacts of flooding in Tabasco, Mexico in 2009 [18]. Andrade et al. demonstrated in their analysis of a 2016 earthquake centered in Manabi, Ecuador that using aggregated activity through call towers can protect individual users from privacy concerns while assessing the extent of urban infrastructure damage and providing insight into patterns of mobility depending on the user’s proximity to the epicenter of the earthquake [19]. Horanont et al. (2013) used 9.2 billion location records, derived from an auto-global navigation satellite system (GNSS) service from a telecommunications company in Japan, to analyze human mobility patterns after the 2011 Great Japan Earthquake [20]. CDR data has also been utilized in applications of epidemics and disease control. Peak et al. (2018) used CDRs to algorithmically investigate the decrease of travel during the Ebola epidemic intervention in Sierra Leone [21]. These studies were able to detect the mobility and behavior patterns of the population after a large-scale natural disaster, showing a promising method to prepare for damage assessments and post-disaster responses.
Another type of telecommunications data that can be used in similar applications to CDRs is personal electronic device (PED) data. With the expansion of mobile phone accessibility in the global South, PED data has been leveraged to assist in the challenges faced in large-scale humanitarian emergencies. The use of PED data has developed in parallel with CDR data and is gathered instead from global navigation satellite systems (GNSSs) in smart devices (i.e., cell phones, smart phones, smart watches, smart tablets, etc.) [22]. PED data differs from the previously used CDR data, as it collects information about a location without depending on the transmission of communications (calls and/or texts) and, therefore, yields a higher level of accuracy of location precision. Yabe et al. (2019) utilized PED data of one million users following the Kumamoto earthquake to estimate the evacuation rates relative to seismic intensities [23]. Chen et al. (2020) utilized PED data from Baidu Map to track urban flow changes in Shenzhen during Typhoon Mangkhut [24]. More recently, with the COVID-19 pandemic, PED data is being used to track the mobility, transmission, and success of social-distancing guidelines. Liautaud, Huybers, and Santillana (2020) leveraged PED data to analyze the decrease in mobility with fever incidences from thermometers connected to smartphones [25]. It confirmed that social distancing has reduced transmission of the virus and could help identify potential outbreaks in the future.
PED data provides extremely rich spatiotemporal data on human mobility and can be used in many multidisciplinary applications, such as natural disasters, public health, credit fraud, human rights violations, etc. [26]. Companies like LocationSmart (www.locationsmart.com), Foursquare (www.foursquare.com), or Cuebiq (www.cuebiq.com) sell offline location analytics for businesses to provide consumer insights and marketing. Organizations like UNICEF and the World Bank also leverage this locational data for real-time humanitarian responses [27] (Figure 1).
This study details an approach using pseudonymization PED data to detect locality depopulation from the 2017 large-scale earthquake in Puebla, Mexico. The data was preprocessed using Python algorithms and loaded into a PostgreSQL database. The approach then used a system of algorithms to detect when the residents left the locality. The algorithms first identified the residents of the localities, compared the number of residents each day, and then analyzed how the population average changed over time to detect communities that were depopulating. By using communities close to the earthquake’s epicenter, as well as similarly sized communities away from the epicenter, this approach accurately showed that communities near the earthquake rapidly depopulated as a result of the earthquake. Currently, there is limited research on the use of PED data tracking mobility during a natural disaster. However, this study seeks to address the gap that exists and describes an approach that provides humanitarian response organizations an affordable, accurate, and automated approach to detect which communities are impacted the hardest by a large-scale humanitarian disaster. Such an approach could find would likely be more valuable in areas that lack other means of reporting (lack of capacity) or where the local government/authorities do not want to share information with the international community (lack of transparency).

2. Materials and Methods

2.1. Data

Personal Electronic Device (PED) Data

The location analytics company Cuebiq Inc. provided access to a sample of Central Mexico’s pseudonymization and privacy-enhanced PED locational data between 4 September to 10 October, 2017. Data was collected in both online and offline modes, so if the connection was lost with the proximate cellular towers, locations would still be recorded and included later within the dataset. Individual devices, based on their international mobile equipment identity (IMEI), were pseudonymized, and their locations can be plotted for one day, indicating patterns like traveling over time (Figure 2). The positioning data was collected by the individual app’s location services using a variety of methods to collect the IMEI’s location. The data was housed in a physically secured computing research facility behind firewalls. The analysis was conducted on this remote server by logging in with personal laptops. The first-party data collected by Cuebiq contained 8 different columns: ID, device type, noise type, latitude, longitude, distance from previous data point, timestamp, and accuracy. With potential privacy information in the PED data, Cuebiq applies procedures to ensure privacy and different layers of protection for all users. For this dataset, data was deidentified by hashing and encrypting the ID, and a noise of 600m was added for home locations (within a geohash grid) and between 20–100 m for all other locations. This noise was added to the dataset to further anonymize specific users. For each type of location, a different privacy methodology was applied. Home and work locations were randomized within census blocks, allowing for the estimation of demographics without actually revealing the locations of the users; sensitive Points of Interest (POIs), such as primary schools, sexual/reproductive health clinics, places of worship, etc., were removed from the dataset completely; whitelisted POIs (commercial and public points of interest) were unchanged; and no-match POIs (all other data points) had a noise of 20–100 m based on the density of data points within the area.

2.2. Earthquake Layer

Following the earthquake, the European Commission’s Emergency Response Coordination Centre (ERCC) produced a detailed map depicting the relative intensity of the earthquake across and beyond the Puebla state [29]. Using a modified Mercalli scale, the ERCC highlighted areas that experienced “Very Strong” (VII), “Strong” (VI), “Moderate” (V), and “Light” (IV) shaking. From this information, we selected the 15 experimental or damaged localities that fell within the “Very Strong” (VII) or “Strong” (VI) areas. For the control localities, we selected 15 localities that were entirely outside of even the most expansive “Light” (IV) areas (Figure 3). This intensity map served as our ground reference data, delineating villages severely affected and those unaffected.

2.3. Study Area

Puebla State is approximately 13,000 square miles, surrounded by the states of Morelos, Veracruz, Guerrero, Oaxaca, Hidalgo, and Tlaxcala. The state is known for its mountainous geography and has benefited from rich volcanic soil for agriculture. While the capital of Puebla, Puebla City (Ciudad de Puebla), is more developed, wealth disparities exist in rural localities within the state. This analytic approach relies on both control and experimental localities having a significant number of residents consistently using apps where their locational data is shared and accessible by Cuebiq. Mexico’s active smartphone penetration, limited to individuals who use their phone at least once a month, extends to just 40.7 percent of the population [30]. Since the usage of location-sharing apps was likely less prevalent in 2017, especially in rural areas, 38 experimental locality candidates were selected from the study area to be possibly used in the study. In the selection of the experimental localities, candidates were chosen based on a population size between 2000 to 6000 from the 2010 Census and their proximity to the epicenter of the earthquake [29,31].
Eighty-two control candidate localities were also based on the same population criteria as the experimental dataset but chosen from other Mexican states that were not within the range of the earthquake (Puebla, Oaxaca, and Guerrero). In order to establish the residents of a particular locality, as well as determine the percentage of residents who left that locality on a given day, a geometric buffer was manually created around each locality. These “geofences” were created visually from satellite imagery on Google Earth to capture nearly all of the inhabited area while excluding as much uninhabited hinterland as possible. The size of the village was not considered, only the visually developed area. Given the heterogeneity of the population size and locality shape, each locality was assigned a unique buffer, ranging from approximately 500 to 4000 in diameter (Figure 3). Of these candidates, 15 control and 15 experimental localities were used in the final analysis (Figure 4) once they were assessed for market penetration.

2.4. Methodology

The analytic workflow processed each candidate locality through a series of scripts. If that locality met the requirements detailed below, it was used in the final dataset (Figure 5).

2.4.1. Database Preparation

For 14.4 GB of locational data, PostgreSQL’s suitability for more structured, robust, and tabular data made for our database of choice. The preparation of the data consisted of:
(1) Converting the provided data into a readable csv file by running a Python conversion script. This code unzipped the daily csv files, inserted the corresponding column names, and then saved the data into a new csv file. Each day of data for the 37 days consisted of more than a hundred csv files, depending on the amount of data for that day.
(2) Loading the data into the database using a custom-scripted Python algorithm. The script connected to the PostgreSQL database, iterated through the csv files, and then imported the data into the table, utilizing the pyscopg2 Anaconda package. It took approximately 6.5 h to load an initial 225,962,016 rows that represented 37 days’ worth of data.
(3) Creating a Python script that connected the database to a Jupyter Notebook in order to conduct large-scale operations on the data.

2.4.2. Preprocessing

In order to improve the code runtime and discard noise in the dataset, a temporal preprocessing (TP) algorithm was developed and implemented. The initial dataset contained 774,343 unique IDs. Preprocessing was established to only analyze those IDs where a pattern-of-life analysis could be conducted.
This algorithm was designed to remove IDs and all their associated “hits” or records that did not have a significant presence throughout the dataset (Figure 6). First, the algorithm converted each record’s epoch timestamp into a YYYY-MM-DD HH:MM format (ex: 2018-10-12 11:33), which was then truncated to display just the day of the month (ex: 12). The algorithm then checked for how many distinct days each ID was present in the dataset. If an ID was detected for at least 4 distinct days of the 37 complete days (from 4 September to 11 October) present in the dataset) then the records associated with the ID were kept. If not, the records associated with that ID were removed from the dataset. Four was chosen as the minimum number of days to establish that an ID lived in a specific village but kept as many unique IDs as possible. This reduced the dataset by 8,271,655 rows to a total of 217,690,361 rows—a total reduction of 3.6%. Running this preprocessing algorithm took 5 h on a PowerEdge R530 Server with two Intel Xeon E5-2695 2.1GHz processors.
Removing IDs without a sustained presence in the dataset ensured that our study could derive meaningful trends from a reasonable number of data points. Before preprocessing, the IDs had an average of 7.86 days with a record. Preprocessing eliminated many IDs with 1 or 2 total records, bringing up the average days that an ID was present to 13.48 (Table A1 (Figure 7).

2.4.3. Resident Determination Algorithm

The resident determination algorithm established an indicator for residency in a locality. A resident of a particular locality was defined as a unique ID that was present within that locality’s buffer at least 3 different pre-earthquake days (4 September to 18 September). This processing script first augmented the previously temporally preprocessed data with 3 additional columns. The first created geography type points corresponding to the latitude and longitude originally provided with each record. The second and third, respectively, extracted the day of the year and the hour of the day associated with each record. The subset of the data recorded before the earthquake was then copied into a second table.
For each control and experimental locality, new copies of the pre-earthquake and the temporally preprocessed data were created. All points within both subsets that did not intersect with a spatial buffer manually calibrated for the physical size of the associated town were discarded. This geofence was adjusted for the scrambled nature of the coordinates provided by Cuebiq, adding a 610-m margin to account for points thrown outside of the buffer that would have been detected inside the buffer otherwise.
The algorithm attempted to balance suppressing false positives and extracting as many residents from the data as possible. With the temporal restraints of the data preventing the establishment of a baseline residency, the algorithm aimed to capture as many individuals with high and consistent levels of activity throughout the dataset as possible by defining an ID as a resident if they were recorded within the geofence for at least 20 percent of the unique pre-earthquake days (3 days). Since residents may leave town for short periods of time, the algorithm’s 20 percent threshold ensures the detection of those with strong geospatial connections to the locality in question. The IDs defined as residents of a locality were then stored in a separate table.

2.4.4. Market Penetration Criteria

Once each locality’s residents were established, the workflow next ensured that each experimental and control locality met a market penetration standard for further analysis. This threshold was set to filter out localities with less than 3 identified residents.
In order to obtain 15 control and 15 experimental localities, this study examined 82 control candidates and 38 experimental candidate localities. Since smartphone penetration in Mexico is relatively lower in rural areas, we expect that a similar analysis run on more current data will yield a larger number of qualifying localities.

2.4.5. Resident Slope Analysis

For the 15 experimental localities and 15 control localities, our approach sought to identify localities that experienced a decrease of residents after the earthquake. In response, a slope analysis of the residents before and after the earthquake was conducted. A slope analysis showed the decrease of residents in each locality through the difference between slopes and was chosen because the method provided a better means of accounting for a more gradual migration out of a given locality. This slope difference for each locality was calculated using the LINEST function in Microsoft Excel. Localities that experienced a decrease in population of greater than 24% from before the earthquake to after the earthquake were “alerted” as being rapidly depopulated (Table A3). This threshold of 24% was set to maximize the overall accuracy for the 30 villages.

2.5. Validation

From the 15 experimental localities, any locality with a decrease in population of greater than 24% was “alerted” as being rapidly depopulated due to the earthquake. Any locality that did not fit this criterion was considered an omission error. Likewise, 15 localities outside of the earthquake zone and between the same sizes as the experimental dataset were chosen at random to assess the commission errors. If any of the control localities also recorded a decrease in population of greater than 24% (−0.24), they were considered “alerted” or a commission error (Figure 8).

3. Results

This approach resulted in an overall accuracy of 73% in detecting the depopulation of localities following the earthquake. Out of the 15 control localities, 12 were detected as not depopulated or an insignificant decrease in residents after the earthquake, yielding a commission error of 20% (Table A3). Of the 15 experimental localities, 10 were detected with a decrease greater than 24% or more of the population after the earthquake, yielding an omission error of 33% (Table A3).
All locality candidates for the controls were randomly selected from 21 out of the 32 states in Mexico; at least two localities from each state in between the population sizes of 1000 - 6000 people were chosen. All 15 experimental localities were in the state of Puebla, geographically proximate to the epicenter of the earthquake (Figure 9). There appeared to be no spatial correlation between the false negatives or experimental localities that failed to alert. The five experimental localities that did not alert were in the same geographic proximity to the 10 experimental localities that correctly alerted. Six out of the 15 of the control localities were from the state of Mexico, two from Hidalgo, two from Yucatan, two from Guerrero, one from Michoacán, one from Chiapias, and one from Colima. Additionally, while there appeared to be no definitive spatial correlation of the controls, most false positives were in states near the central region of the country, west of the earthquake.
The overall population of a locality did not correlate with the accuracy for the control or experimental localities. For example, control locality San Miguel Ixtapan had a population of 1251 people with a slight decrease of 1.36% of residents, whereas control locality Huamuxtitlán had a population of 6063 but an increase of 3.15% (Table A2). Similarly, experimental locality Domingas Arenas had a population of 5864 people and a decrease of 47.86% of residents after the earthquake, but experimental locality San Felix Hidalgo had a population of 1628 people and a decrease of 53.73% (Table A3).
Lastly, of the experimental localities, there was not a strong correlation between earthquake zone severity and the accuracy of the algorithm. Of the seven localities located in Zone VI, two did not alert, while of the eight localities in Zone VII, three did not alert (Figure 4) (Table A4) (Table A5).

4. Discussion

This approach leveraged Cuebiq pseudonymized PED data to identify and analyze patterns of movement after a natural disaster. The methods used in this study provided a workflow that can be potentially used as a framework for other natural disasters or similar incidences (such as violence) that prompt migration. The spatial distribution and market penetration of the 2017 data posed a challenge when working with the Cuebiq PED data; it was difficult to strike a balance between examining localities with a tighter, more reliable geofence and ones that had enough residents to analyze longitudinally. The 15 control and 15 experimental localities could only be selected after examining 82 control and 38 experimental candidates and checking their market penetrations to see if their impacts could be measured and analyzed. Working with smaller, more rural localities (generally with a population less than 2000) meant using smaller buffers with greater specificity in identifying residents—false positives were less likely to be seen from individuals moving through the area while they commuted to work, etc. However, there were fewer residents in those localities overall, because there was low cell phone penetration in rural areas. This will likely not be as pressing of an issue when replicating the experiment on more recent data, since PED penetration has shown to consistently increase over time; in 2019, Mexico’s smartphone penetration increased 10 percentage points in three years to capture 49.5% of the population [32].
There are also data collection issues presented in this approach. If a disaster is severe, it will likely destroy the telecom and power networks. Since current PEDs rely on the terrestrial telecom network, they will have no way to upload their information. With the advent of low-Earth constellations of communication satellites, like SpaceX’s Starlink (starlink.com), this could be alleviated, but the devices will still lose their charge without a power network within a few days. This could be addressed with dispersed power sources such as diesel generators, solar panels, or small-scale wind generators, but these are not widely available in some less-developed areas.
Another drawback of this study is the temporal distribution of the data. Since the data spanned a little more than a month—15 days before the earthquake and 21 days after the earthquake—using a dataset with larger temporal range will likely improve the results of the pattern of life analysis used to correctly identify the residents, including those who may be traveling, working temporarily elsewhere, or only using their smartphones intermittently. Using the approach described here, an average of only 2% of a locality’s 2010 population were tracked as residents [31].
After some consideration, future elements of this research could include, but are not limited to, examining the geological and geographic factors that could have contributed to the lack of market penetration, analyzing the usage of different types of smart device applications that are partnered with Cuebiq and are commonly used in Mexico, understanding the aggregate demographics of the region in correlation to smartphone usage, and other situational circumstances that affected the population’s mobility but were not previously identified. These reasons were the top hypotheses as for why the algorithm performed better on some localities than others, and further investigation could give potential insight on the lack of market penetration and more accurate adjustments for the algorithm.
The approach presented here could be extended to the present day, monitoring currently intact localities in areas at risk of humanitarian disasters. While downloading, loading into the database and preprocessing this month of data in our study region took approximately three days on a Windows 10 server; automating the process to remove the inefficiencies incurred by optimizing the workflow to output parallel datasets differing in experimental parameters could make this process feasible on a weekly basis. During the aftermath of a humanitarian disaster, data could pinpoint localities incurring the greatest damage. Additionally, this approach could be modified to function from a period of a few hours as opposed to a 24-h period to serve as an early warning mechanism of a humanitarian disaster and the mass migration that may result. While there are hurdles, this approach is scalable. A cloud-based effort could easily monitor all localities in an area, such as this one in Mexico, alerting users any time a spatial grouping of localities experiences rapid depopulation.
Organizations interested in this approach could modify the algorithm’s sensitivity to suit their objectives. For example, in situations such as monitoring a specific region at risk of disaster, the −24% slope difference could be modified to −15% percent, reducing the chance that a locality’s depopulation would miss detection. This lowers the omission rate but increases the commission rate. Conversely if an organization is concerned with the total number of people fleeing, the alert level could be changed to 30% to reduce commission errors.

5. Conclusions

In future studies, building a better baseline of locality patterns of life could account for weekly, monthly, and annual changes and significantly improve the accuracy. This is possible with the significant increase in market penetration of location PED and scalable computing. Additionally, with a longer period of data, a custom buffer could be created for each ID based on their usual movements. When the ID passes out of that buffer, it would be recorded as an abnormal movement. Enough IDs with abnormal movements on a specific day could indicate an anomalous environmental or political event. The advantage that this algorithm has is that it could be used in larger cities for when one neighborhood might be destroyed and the residents shift to a neighboring one. A disadvantage is that a false positive could occur for other reasons, such as large-scale sporting events.
As more years of data become available, a greater number of incidents like humanitarian emergencies will provide “labels” for training artificial intelligence or machine-learning algorithms. These algorithms will more accurately detect anomalous situations of interest, such as the localities most severely affected by a humanitarian disaster. The operationalization of this approach will likely be possible in the near future with increased data collection and availability, improved computer processing, and near-real time. Operationalization will, however, require that the location data from PEDs are transferred, requiring a functioning telecom network or a space-based communication network such as Starlink. Operationalization also requires that PEDs can continue to be powered through either a functioning power grid or, if that is destroyed in a severe disaster, dispersed power sources such as diesel generators, solar panels, or small-scale wind generators.
With increases in computing capacity and the growing database of PED on a global scale, approaches like this have the possibility of providing researchers and practitioners a way to monitor large areas at-risk of humanitarian disaster. It is understood that this approach will never replace on-the-ground witnesses but serve as a low-cost alert system capable of providing additional information in areas lacking connectivity. Such an approach will be even more valuable in areas that lack other means of reporting (lack of capacity) or the local government/authorities do not want to share information with the international community (lack of transparency). Our hope is that this research helps provide organizations committed to providing emergency humanitarian responses a way to act in a decisive manner through informed, data-driven insights and, ultimately, reduce the suffering of those affected by a disaster.

Author Contributions

Conceptualization, Andrew Marx; data curation, Mia Poynor and Young-Kyung Kim; formal analysis, Andrew Marx; methodology, Andrew Marx; project administration, Andrew Marx; supervision, Andrew Marx; validation, Mia Poynor and Lauren Oberreiter; writing—original draft, Andrew Marx, Mia Poynor, Young-Kyung Kim, and Lauren Oberreiter; and writing—review and editing, Andrew Marx, Mia Poynor, Young-Kyung Kim, and Lauren Oberreiter All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the University of Southern California student scholarships, including the Provost Undergraduate Research Fellowship and the Dornsife Student Opportunities for Academic Research.

Acknowledgments

The authors would like to thank the individuals at Cuebiq for their data. The authors would also like to thank Garrett Nance at the University of Southern California for his contributions in developing the code for database extraction and loading for the study and Kurtis Weatherford for his contributions in creating the spatial buffers and maps used in this study.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Before preprocessing, the IDs had an average of 7.86 days with a record. Preprocessing eliminated many IDs with one or two records total, bringing the average days the IDs had a hit up to 13.48.
Table A1. Before preprocessing, the IDs had an average of 7.86 days with a record. Preprocessing eliminated many IDs with one or two records total, bringing the average days the IDs had a hit up to 13.48.
Number of Unique Days that an ID had a recordBefore PreprocessingAfter Preprocessing
Mean7.8613.48
Median410
Mode14
Maximum3737
Minimum14
25th percentile16
75th percentile1019
Table A2. Fifteen control localities were used in the final analysis. In processing, an additional 610m was added to this buffer radius to account for privacy data screening by the data provider.
Table A2. Fifteen control localities were used in the final analysis. In processing, an additional 610m was added to this buffer radius to account for privacy data screening by the data provider.
Locality NameStateLatitude, LongitudePopulationResidents DetectedBuffer Size (m)Earthquake Class
San Gabriel IxtlaMexico19.257527, 100.1240991624139995None
TuzantlaMichoacán19.207914, 100.573904279810731None
El ArenalHidalgo20.226695, 98.90593729339794None
San Antonio del RosarioMexico18.398290, 100.30932513569965None
La CompaniaMexico19.163415, 100.082659109410530None
Almoloya de AlquisirasMexico18.864324, 99.8899823153131175None
OlinaláGuerrero17.777701, 98.739085579232803None
Santiago TezontlaleHidalgo20.161152, 99.097445422619928None
El TerreroMexico18.790601, 99.64658812737752None
San Miguel IxtapanMexico18.808699, 100.15192812519900None
HuamuxtitlánGuerrero17.804181, 98.56262360639887None
ChicomuseloChiapas15.743312, 92.28328593881420None
MinatitlánColima19.387388, 104.049396458814851None
SotutaYucatan20.596148, 89.0085225548311371None
TetizYucatan20.961913, 89.933243939421206None
Table A3. Fifteen experimental localities were used in the final analysis all from earthquake class areas of VI/VII or higher. In processing, an additional 610m was added to this buffer radius to account for privacy data screening by the data provider.
Table A3. Fifteen experimental localities were used in the final analysis all from earthquake class areas of VI/VII or higher. In processing, an additional 610m was added to this buffer radius to account for privacy data screening by the data provider.
Locality NameStateLatitude, LongitudePopulationResidents DetectedBuffer Size (m)Earthquake Class
San Felix HidalgoPuebla18.899207, 98.396168162834590VII
TepeojumaPuebla18.727938, 98.4482944788351200VII
Tepexi de RodriguezPuebla18.580354, 97.929589493322990VI/VII
San Felix RijoPuebla18.718743, 98.548084111817818VII
HuaquechulaPuebla18.771417, 98.543500300520916VII
San Juan RabosoPuebla18.571540, 98.441297363717546VII
TlapanalaPuebla18.694467, 98.534198272713957VII
San Lucas ColucánPuebla18.508552, 98.481103257711739VII
MatzacoPuebla18.557439, 98.489604258034784VII
Domingo ArenasPuebla19.139648, 98.4570575864261004VI/VII
Santa Clara OcoyucanPuebla18.974270, 98.30066848718491107VII
Santa Maria AcuexcomacPuebla19.036634, 98.385763443219978VII
San Gabriel TetzoyocanPuebla18.758063, 97.7043896060183776VI/VII
Santiago ColtzingoPuebla19.382436, 98.534143315512947VI/VII
San Mateo CapultitánPuebla19.194610, 98.417855232810720VI/VII
Table A4. The resident counts for the 30 localities for each of the 37 days were used to calculate their slope before and after the earthquake (Table A5).
Table A4. The resident counts for the 30 localities for each of the 37 days were used to calculate their slope before and after the earthquake (Table A5).
Village Name9/49/59/69/79/89/99/109/119/129/139/149/159/169/179/189/199/209/219/229/239/249/259/269/279/289/299/3010/110/210/310/410/510/610/710/810/910/10
San Gabriel Ixtla41516658576262565050537366714839363637343129232326293335252120212628192718
Tuzantla4457548687678676776445345323444332322
Olinalá3553455467667443344555433222333322232
Almoloya de Alquisiras1345445533432545436555434323433222431
La Compania2344645554467522223412321122111111210
San Antonio del Rosario4668376747666657453453254433354433332
El Arenal981012118181712108126101097987597810777776855755
Santiago Tezontlale766710811689669876546765545748443378634
El Terrero2443233445434224424233455232323343231
San Miguel Ixtapan6755447664542653342114554323311111220
Huamuxtitlán4664353446664334353323422123433333332
Chicomuselo2322213344436432332322164334455356565
Minatitlán6874366775677876776667688674454565442
Sotuta10121417131315108911141011886596788131814101311111011121212107
Tetiz1918241618232216182120171721161917181615141212101098108126774542
San Felix Hidalgo9141615116111312171710111314912131410101011101098108126774542
Tepeojuma1216171311121516111016109141314897891010117688957544251
Tepexi de Rodriguez88985571085611611910564478765444543313430
San Felix Rijo67678101310774810987887867674735656653452
Huaquechula68781011149785911986776767475434637553441
San Juan Raboso55891188878711978898101075455588665456355
Tlapanala77668810865257666667746454333555542341
San Lucas Colucán2473664553554232221113212123320000000
Matzaco111417121415171817131915171214131112111099755988946679764
Domingo Arenas791211910109101110141214129989111078107696767878754
Santa Clara Ocoyucan282373374400406366368386393410403375341395360316310318305246264302298289297274257258259262249257257205216220139
Santa Maria Acuexcomac411656477811847566766352454423655344333
San Mateo Capultitlán678580851008692837887817885787879777669727864616157565463636060635646514522
San Gabriel Tetzoyocan4534587675985865767865665346555553443
Concepción Cuautla5563536546645533355453434342213222121
Table A5. A locality was determined as depopulating if the pre-earthquake slope of residents was 24% lower than the post-earthquake slope of residents.
Table A5. A locality was determined as depopulating if the pre-earthquake slope of residents was 24% lower than the post-earthquake slope of residents.
Village NamePre-Quake SlopePost-Quake Slope(Post-Slope) − (Pre-Slope)
San Gabriel Ixtla64%−69%−133%
Tuzantla20%−19%−39%
Olinalá12%−11%−24%
Almoloya de Alquisiras4%−14%−17%
La Compania10%−9%−20%
San Antonio del Rosario2%−6%−8%
El Arenal−4%−12%−8%
Santiago Tezontlale4%−3%−7%
El Terrero0%−5%−5%
San Miguel Ixtapan−11%−12%−1%
Huamuxtitlán−5%−2%3%
Chicomuselo18%−10%−28%
Minatitlán10%−9%−19%
Sotuta−26%−20%6%
Tetiz−13%−35%−22%
San Felix Hidalgo9%−45%−54%
Tepeojuma−18%−33%−15%
Tepexi de Rodriguez9%−22%−31%
San Felix Rijo9%−21%−30%
Huaquechula6%−19%−25%
San Juan Raboso12%−20%−32%
Tlapanala−14%−17%−3%
San Lucas Colucán−7%−10%−3%
Matzaco11%−25%−36%
Domingo Arenas29%−19%−48%
Santa Clara Ocoyucan185%−549%−734%
Santa Maria Acuexcomac0%−10%−10%
San Mateo Capultitlán−13%−166%−153%
San Gabriel Tetzoyocan23%−16%−39%

References

  1. Alberto, Y.; Otsubo, M.; Kyokawa, H.; Kiyota, T.; Towhata, I. Reconnaissance of the 2017 Puebla, Mexico earthquake. Soils Found 2018, 58, 1073–1092. [Google Scholar] [CrossRef]
  2. Powerful Quake Devastates Central Mexico, Leaving More Than 200 Dead. Available online: https://www.npr.org/sections/thetwo-way/2017/09/19/552141609/at-least-42-people-killed-as-powerful-earthquake-convulses-central-mexico (accessed on 5 December 2019).
  3. Bengtsson, L.; Lu, X.; Thorson, A.; Garfield, R.; Von Schreeb, J. Improved response to disasters and outbreaks by tracking population movements with mobile phone network data: A post-earthquake geospatial study in Haiti. PLoS Med. 2011, 8, e1001083. [Google Scholar] [CrossRef] [PubMed]
  4. Sahar, L.; Muthukumar, S.; French, S.P. Using Aerial Imagery and GIS in Automated Building Footprint Extraction and Shape Recognition for Earthquake Risk Assessment of Urban Inventories. IEEE Trans. Geosci. Remote Sens. 2010, 48, 3511–3520. [Google Scholar] [CrossRef]
  5. Feng, T.; Hong, Z.; Wu, H.; Fu, Q.; Wang, C.; Jiang, C.; Tong, X. Estimation of earthquake casualties using high-resolution remote sensing: A case study of Dujiangyan city in the May 2008 Wenchuan earthquake. Nat. Hazards 2013, 69, 1577–1595. [Google Scholar] [CrossRef]
  6. Ródenas, J.L.; García-Ayllón, S.; Tomás, A. Estimation of the Buildings Seismic Vulnerability: A Methodological Proposal for Planning Ante-Earthquake Scenarios in Urban Areas. Appl. Sci. 2018, 8, 1208. [Google Scholar] [CrossRef]
  7. Hossain, M.S.; Gadagamma, C.K.; Bhattacharya, Y.; Numada, M.; Morimura, N.; Meguro, K. Integration of smart watch and Geographic Information System (GIS) to identify post-earthquake critical rescue area part. I. Development of the System. Prog. Disaster Sci. 2020, 7, 100116. [Google Scholar] [CrossRef]
  8. Jena, R.; Pradhan, B.; Al-Amri, A.; Lee, C.W.; Park, H. Earthquake Probability Assessment for the Indian Subcontinent Using Deep Learning. Multidiscip. Digit. Publ. Inst. Sens. 2020, 20, 4369. [Google Scholar] [CrossRef]
  9. Mulder, F.; Ferguson, J.; Groenewegen, P.; Boersma, K.; Wolbers, J. Questioning Big Data: Crowdsourcing crisis data towards an inclusive humanitarian response. Big Data Soc. 2016. [Google Scholar] [CrossRef]
  10. Palen, L.; Hughes, A.L. Social Media in Disaster Communication. In Handbook of Disaster Research Handbooks of Sociology and Social Research; Springer International Publishing AG: Cham, Switzerland, 2017; pp. 497–518. [Google Scholar] [CrossRef]
  11. Marx, A.; Windisch, R.; Kim, J.S. Detecting village burnings with high-cadence smallsats: A case-study in the Rakhine State of Myanmar. Remote Sens. Appl. Soc. Environ. 2019, 14, 119–125. [Google Scholar] [CrossRef]
  12. Automating Disaster Relief. Available online: https://trajectorymagazine.com/automating-disaster-relief/ (accessed on 2 December 2019).
  13. Yu, M.; Yang, C.; Li, Y. Big Data in Natural Disaster Management: A Review. Geosciences 2018, 8, 165. [Google Scholar] [CrossRef]
  14. Salat, H.; Smoreda, Z.; Schläpfer, M. A method to estimate population densities and electricity consumption from mobile and phone data in developing countries. PLoS ONE 2020, 15, e0235224. [Google Scholar] [CrossRef] [PubMed]
  15. Zufiria, P.J.; Pastor-Escuredo, D.; Úbeda-Medina, L.; Hernandez-Medina, M.; Barriales-Valbuena, I.; Morales, A.J.; Jacques, D.C.; Nkwambi, W.; Diop, M.B.; Quinn, J.; et al. Identifying seasonal mobility profiles from anonymized and aggregated mobile phone data. Application in food security. PLoS ONE 2018, 13, e0195714. [Google Scholar] [CrossRef]
  16. Lai, S.; zu Erbach-Schoenberg, E.; Pezzulo, C.; Ruktanonchai, N.; Sorichetta, A.; Steele, J.; Li, T.; Dooley, C.; Tatem, A. Exploring the use of mobile phone data for national migration statistics. Palgrave Commun. 2019, 5, 1–10. [Google Scholar] [CrossRef]
  17. Wilson, R.; zu Erbach-Schoenberg, E.; Albert, M.; Power, D.; Tudge, S.; Gonzalez, M.; Guthrie, S.; Chamberlain, H.; Brooks, C.; Hughes, C.; et al. Rapid and near real-time assessments of population displacement using mobile phone data following disasters: The 2015 Nepal Earthquake. PLoS Curr. 2016, 8. [Google Scholar] [CrossRef] [PubMed]
  18. Pastor-Escuredo, D.; Morales-Guzman, A.; Torres-Fernandez, Y.; Bauer, J.-M.; Wadhwa, A.; Castro-Correa, C.; Romanoff, L.; Lee, J.G.; Rutherford, A.; Frias-Martinez, V.; et al. Flooding through the lens of mobile phone activity. In Proceedings of the IEEE Global Humanitarian Technology Conference (GHTC 2014), San Jose, CA, USA, 10–13 October 2014; pp. 279–286. [Google Scholar] [CrossRef]
  19. Andrade, X.; Layedra, F.; Vaca, C.; Cruz, E. RiSC: Quantifying change after natural disasters to estimate infrastructure damage with mobile phone data. In Proceedings of the 2018 IEEE International Conference on Big Data, Seattle, WA, USA, 10–13 December 2018; pp. 3383–3391. [Google Scholar]
  20. Horanont, T.; Witayangkurn, A.; Sekimoto, Y.; Shibasaki, R. Large-Scale Auto-GPS Analysis for Discerning Behavior Change during Crisis. IEEE Intell. Syst. 2013, 28, 26–34. [Google Scholar] [CrossRef]
  21. Peak, C.M.; Wesolowski, A.; zu Erbach-Schoenberg, E.; Tatem, A.J.; Wetter, E.; Lu, X.; Power, D.; Weidman-Grunewald, E.; Ramos, S.; Moritz, S.; et al. Population mobility reductions associated with travel restrictions during the Ebola epidemic in Sierra Leone: Use of mobile phone data. Int. J. Epidemiol. 2018, 47, 1562–1570. [Google Scholar] [CrossRef] [PubMed]
  22. Human Rights Watch. Mobile Location Data and Covid-19: Q&A. Available online: https://www.hrw.org/news/2020/05/13/mobile-location-data-and-covid-19-qa# (accessed on 2 December 2019).
  23. Yabe, T.; Sekimoto, Y.; Tsubouchi, K.; Ikemoto, S. Cross-comparative analysis of evacuation behavior after earthquakes using mobile phone data. PLoS ONE 2019, 14, e0211375. [Google Scholar] [CrossRef]
  24. Chen, Z.; Gong, Z.; Yang, S.; Ma, Q.; Kan, C. Impact of extreme weather events on urban human flow: A perspective from location-based service data. Comput. Environ. Urban. Syst. 2020, 83, 101520. [Google Scholar] [CrossRef] [PubMed]
  25. Liautaud, P.; Huybers, P.; Santillana, M. Fever and mobility data indicate social distancing has reduced incidence of communicable disease in the United States. arXiv 2020, arXiv:2004.09911. Available online: https://arxiv.org/pdf/2004.09911.pdf (accessed on 25 May 2020).
  26. Thompson, S.A.; Warzel, C. Twelve Million Phones, One Dataset, Zero Privacy. 2019. Available online: https://www.nytimes.com/interactive/2019/12/19/opinion/location-tracking-cell-phone.html (accessed on 2 December 2019).
  27. Cuebiq’s Data for Good Program Provides UNICEF with High-Precision Human Mobility Data for Real-Time Response to Humanitarian. Available online: https://www.bloomberg.com/press-releases/2019-09-10/cuebiq-s-data-for-good-program-provides-unicef-with-high-precision-human-mobility-data-for-real-time-response-to-humanitarian (accessed on 2 December 2019).
  28. Earthquake Leaves Hundreds Dead, Crews Combing through Rubble in Mexico. Available online: https://www.citynews1130.com/2017/09/20/earthquake-leaves-hundreds-dead-crews-combing-rubble-mexico/ (accessed on 5 October 2020).
  29. Emergency Response Coordination Centre (ERCC)—DG ECHO Daily map. Available online: https://erccportal.jrc.ec.europa.eu/ercmaps/ECDM_20170920_Mexico_EQ.pdf (accessed on 25 May 2020).
  30. Newzoo Global Mobile Market Report Light Version. 2017. Available online: https://resources.newzoo.com/hubfs/Reports/Newzoo_2017_Global_Mobile_Market_Report_Free.pdf (accessed on 2 December 2019).
  31. Censos y Conteeos de Poblaación y Vivienda. Available online: https://www.inegi.org.mx/programas/ccpv/2010/default.html#Microdatos (accessed on 2 December 2019).
  32. Newzoo Global Mobile Market Report 2019: Light Version. Available online: https://resources.newzoo.com/hubfs/Reports/2019_Free_Global_Mobile_Market_Report.pdf?utm_campaign=Mobile%20Report%20Launch%202019&utm_medium=email&_hsmi=76926953&_hsenc=p2ANqtz-_O72fQKM2ds9C0e-CuL4yhUoFzrQCAirctfPQYeB6ab0u_Qx998l6SSv0rf4SlGCKGX3DCWTTE-lCCUesAC9RsQqXGJg&utm_content=76926953&utm_source=hs_automation (accessed on 25 May 2020).
Figure 1. An image of a destroyed building in Jojutla, Morelos, a rural town close to the epicenter of the 2017 Puebla earthquake [28].
Figure 1. An image of a destroyed building in Jojutla, Morelos, a rural town close to the epicenter of the 2017 Puebla earthquake [28].
Ijgi 09 00643 g001
Figure 2. The path of one unique ID in the state of Guerrero, Mexico on 22 September 2017. Lighter colors indicate locational points from earlier in the day. This number of pings in one day is unusual but illustrative of the spatiotemporal richness of the data.
Figure 2. The path of one unique ID in the state of Guerrero, Mexico on 22 September 2017. Lighter colors indicate locational points from earlier in the day. This number of pings in one day is unusual but illustrative of the spatiotemporal richness of the data.
Ijgi 09 00643 g002
Figure 3. Unique geofences were manually constructed around the control and experimental communities, such as the 867-m buffer around Huamuxtitlán, Guerrero of Huamuxtitlán.
Figure 3. Unique geofences were manually constructed around the control and experimental communities, such as the 867-m buffer around Huamuxtitlán, Guerrero of Huamuxtitlán.
Ijgi 09 00643 g003
Figure 4. Map of the final 15 experimental localities, within the two strongest earthquake zones, that were used in the final analysis detecting localities that depopulated as a result of the 2017 Puebla earthquake [29].
Figure 4. Map of the final 15 experimental localities, within the two strongest earthquake zones, that were used in the final analysis detecting localities that depopulated as a result of the 2017 Puebla earthquake [29].
Ijgi 09 00643 g004
Figure 5. In the analytic workflow, following the geofence (Section 2.3), residents of the villages were established (Section 2.4.3). Only villages with at least 3 residents (Section 2.4.4) were analyzed to detect if they experienced a depopulation after the earthquake (Section 2.4.5).
Figure 5. In the analytic workflow, following the geofence (Section 2.3), residents of the villages were established (Section 2.4.3). Only villages with at least 3 residents (Section 2.4.4) were analyzed to detect if they experienced a depopulation after the earthquake (Section 2.4.5).
Ijgi 09 00643 g005
Figure 6. The distribution of unique IDs and the number of days that they were recorded as being present in the database.
Figure 6. The distribution of unique IDs and the number of days that they were recorded as being present in the database.
Ijgi 09 00643 g006
Figure 7. Preprocessing the data removed 369,839 unique IDs that did not have at least one record for four unique days prior to the earthquake. The original total of 774,343 (Figure 6) unique IDs was reduced to 404,504.
Figure 7. Preprocessing the data removed 369,839 unique IDs that did not have at least one record for four unique days prior to the earthquake. The original total of 774,343 (Figure 6) unique IDs was reduced to 404,504.
Ijgi 09 00643 g007
Figure 8. This approach detected a locality as depopulating if the pre-earthquake slope of residents was 24% lower than the post-earthquake slope of residents. In this figure, the experiment locality Huaquechula had a consistent percent of residents (6% slope) before the earthquake but saw a steady decline in residents after the earthquake (−19%), while the control locality El Terrero maintained a steady percent of its residents both before (0%) and after (−5%).
Figure 8. This approach detected a locality as depopulating if the pre-earthquake slope of residents was 24% lower than the post-earthquake slope of residents. In this figure, the experiment locality Huaquechula had a consistent percent of residents (6% slope) before the earthquake but saw a steady decline in residents after the earthquake (−19%), while the control locality El Terrero maintained a steady percent of its residents both before (0%) and after (−5%).
Ijgi 09 00643 g008
Figure 9. Map of the 15 control and 15 experimental localities. The plus symbol indicates incorrect control or experimental localities.
Figure 9. Map of the 15 control and 15 experimental localities. The plus symbol indicates incorrect control or experimental localities.
Ijgi 09 00643 g009
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Back to TopTop