Tag Archives: data

Analyzing Call Dynamics to Assess the Impact of Earthquakes

Earthquakes can cripple communication infrastructure and influence the number of voice calls relayed through cell phone towers. Data from cell phone traffic can thus be used as a proxy to infer the epicenter of an earthquake and possibly the needs of the disaster affected population. In this blog post, I summarize the findings from a recent study carried out by Microsoft Research and the Santa Fe Institute (SFI).

The study assesses the impact of the 5.9 magnitude earthquake near Lac Kivu in February 2008 on Rwandan call data to explore the possibility of inferring the epicenter and potential needs of affected communities. Cellular networks continually generate “Call Data Records (CDR) for billing and maintenance purposes” which can be used can be used to make inferences following a disaster. Since the geographic spread of cell phones and towers is not randomly distributed, the authors used methods to capture propagating uncertainties about their inferences from the data. This is important to prioritize the collection of new data.

The study is based on the following 3 assumptions:

1. Cell tower traffic deviates statistically from the normal patterns and trends in case of an unusual event.
2. Areas that suffer larger disruptions experience deviations in call volume that persist for a longer period of time.
3. Disruptions are overall inversely proportional to the distance from the center(s) of a catastrophe.

Based on these assumptions, the authors develop algorithms to detect earthquakes, predict their epicenter and infer opportunities for assistance. The results? Using call data to detect when in February 2008 the earthquake took place yields a highly accurate result. The same is true for predicting the epicenter. This means that call activity and cell phone towers can be used as a large-scale seismic system.

As for inferring hardest hit areas, the authors find that their “predicted model is far superior to the baseline and provides predictions that are significantly better for k = 3, 4 and 5″ where k represents the number of days post-earthquake. In sum, “the results highlight the promise of performing predictive analysis with existing telecommunications infrastructure.” The study is available on the Artificial Intelligence for Development (AI-D) website.

In the future, combining call traffic data with crowdsourced SMS data (see this study on Haiti text messages) could perhaps provide even more detailed information on near real-time impact and needs following a disaster. I’d be very interested to see this kind of study done on call/SMS data before, during and after a contested election or major armed conflict. Could patterns in call/SMS data in one country provide distinct early warning signatures for elections and conflict in other crises?

New Dataset Represents Breakthrough for Crisis Mapping Analysis

The Peace Research Institute in Oslo (PRIO) has just released the latest version of the Armed Conflict Location and Event Dataset (ACLED), which I blogged about last year here. The new peer-reviewed paper on this latest release is available here and you can watch ACLED’s presentation at the 2009 International Conference on Crisis Mapping (ICCM 2009) right here. The unit of analysis for ACLED is “an individual event that occurred at a given location.”

This new version has geo-referenced data for 50 unstable countries from 1997 through to 2010. The real breakthrough here is not just the scope of geographical coverage but more importantly how incredibly up to date the data is. I’m excited about this because it is rare that academic datasets can actually inform policy or operational response in a timely way. Academic datasets are generally outdated.

PRIO’s updated dataset codes the “actions of rebels, governments, and militias within unstable states, specifying the exact location and date of battle events, transfers of military control, headquarter establishment, civilian violence, and rioting.” As the authors note, the dataset’s “disaggregation of civil war and transnational violent events allow for research on local level factors and the dynamics of civil and communal conflict.”

Indeed, “micro-level datasets allow researchers to rigorously test sub-national hypotheses and to generate new causal arguments that cannot be studied with country-year or static conflict-zone data.” The authors identify four distinctive advantages of disaggregating local conflict event-data:

  1. Data can be aggregated to any desired level for analysis;
  2. The types of conflict events (e.g. battles or civilian violence) can be analyzed separately or in tandem;
  3. The actors within a conflict can be grouped or analyzed separately;
  4. The dynamics of national or regional war clusters can be addressed together.

The academic paper that discusses this new release of ACLED doesn’t go into much geospatial analysis but the dataset will no doubt catalyze many  analytical studies in the near future. One preliminary finding, however, shows that using country-level data can lead to biased results when studying conflict dynamics. “The average percentage of area covered by civil war from the data sample is approximately 48%, but the average amount of territory with repeated fighting is considerably smaller at 15%. Further, most conflicts initially start out as very local phenomena.”

The Future of Digital Activism and How to Stop It

I’ve been following a “debate” on a technology list serve which represents the absolute worse of the discourse on digital activism. Even writing the word debate in quotes is too generous. It was like watching Bill O’Reilly or Glenn Beck go all out on Fox News.

The arguments were mostly one-sided and mixed with insults to create public ridicule. It was blatantly obvious that those doing the verbal lynching were driven by other motives. They have a history of being aggressive and seeking provocation in public because it gets them attention, which further bloats their egos. They thrive on it. The irony? Neither of them have much of a track record to speak of in the field of digital activism. All they seem to do is talk about tech in the context of insulting others who get engaged operationally and try to make a difference. Constructive criticism is important, but this hardly qualifies. This is a shame as these individuals are otherwise quite sharp.

So how do we prevent a Fox-styled future of Digital Activism? First, ignore these poisonous debates. If people were serious about digital activism, the discourse would take on a very different tone, a professional one. Second, don’t be fooled, most of the conversations on digital activism are mixed with anecdotes, selection bias and hype, often to get media attention. You’ll find that most involved in the “study” of digital activism have no idea about methodology and research design. Third, help make data-driven, mixed-methods research on digital activism  possible by adding data to the Global Digital Activism Data Set (GDADS). The Meta-Activism Project (MAP) recently launched this data project to catalyze more empirical research on digital activism.

Evaluating Accuracy of Data Collection on Mobile Phones

The importance of data validation is unquestioned but few empirical studies seek to assess the possible errors incurred during mobile data collection. Authors Somani Patnaik, Emma Brunskill and William Thies thus carried out what is possibly the first quantitative evaluation  (PDF) of data entry accuracy on mobile phones in resource-constrained environments. They just presented their findings at ICTD 2009.

Mobile devices have become an increasingly important tool for information collection. Hence, for example, my interest in pushing forward the idea of Mobile Crisis Mapping (MCM). While studies on data accuracy exist for personal digital assistants (PDAs), there are very few that focus on mobile phones. This new study thus evaluates three user interfaces for information collection: 1) Electronic forms; 2) SMS and 3) voice.

The results of the study indicate the following associated error rates:

  • Electronic forms = 4.2%
  • SMS = 4.5%
  • Voice = 0.45%

For compartive purposes and context, note that error rates using PDAs have generally been less than 2%. These figures represent the fraction of questions that were answered incorrectly. However, since “each patient interaction consisted of eleven questions, the probability of error somewhere in a patient report is much higher. For both electronic forms and SMS, 10 out of 26 reports (38%) contained an error; for voice, only 1 out of 20 reports (5%) contained an error (which was due to operator transcription).

I do hope that the results of this study prompt many others to carry out similar investigations.  I think we need a lot more studies like this one but with a larger survey sample (N) and across multiple sectors (this study drew on just 13 healthworkers).

The UN Threat and Risk Mapping Analysis (TRMA) project I’m working on in the Sudan right now will be doing a study on data collection accuracy using mobile phones when they roll out their program later this month. The idea is to introduce mobile phones in a number of localities and not in neighboring ones. The team will then compare the data quality of both samples.

I look forward to sharing the results.

Patrick Philippe Meier