Dear Good People at Benetech,
Issues with Crowdsourced Data Part 1
Issues with Crowdsourced Data Part 2
The short version of the debate is this: the EC’s exploratory study found that the spatial pattern of text messages from Mission 4636 in Haiti was positively correlated with building damage in Port-au-Prince. This would suggest that crowdsourced SMS data had statistical value in Haiti—in addition to their value in saving lives. But Benetech’s study shows a negative correlation. That’s basically it. If you’d like to read something a little more spicy though, do peruse this recent Fast Company article, fabulously entitled “How Benetech Slays Monsters with Megabytes and Math.” In any case, that’s the back-story.
So lets return to the Good People at Benetech. I thought I’d offer some of my humble guidance in case you feel threatened again in the future—I do hope you don’t mind and won’t take offense at my unsolicited and certainly imperfect advice. So by all means feel free to ignore everything that follows and focus on the more important work you do in the human rights space.
Next time Benetech wants to try and discredit the findings of a study in some other discipline, I recommend making sure that your own counter-findings are solid. In fact, I would suggest submitting your findings to a respected peer-reviewed journal—preferably one of the top tier scientific journals in your discipline. As you well know, after all, this really is the most objective and rigorous way to assess scientific work. Doing so would bring much more credibility to Benetech’s counter-findings than a couple blog posts.
My reasoning? Benetech prides itself (and rightly so) for carrying out some of the most advanced, cutting-edge quantitative research on patterns of human rights abuses. So if you want to discredit studies like the one carried out by the EC, I would have used this as an opportunity to publicly demonstrate the advanced expertise you have in quantitative analysis. But Benetech decided to use a simple non-spatial model to discredit the EC’s findings. Why use such a simplistic approach? Your response would have been more credible had you used statistical models for spatial point data instead. But granted, had you used more advanced models, you would have found evidence of a positive correlation. So you probably won’t want to read this next bit: a more elaborate “Tobit” correlation analysis actually shows the significance of SMS patterns as an explanatory variable in the spatial distribution of damaged buildings. Oh, and the correlation is (unfortunately) positive.
But that’s really beside the point. As my colleague Erik Hersman just wrote on the Ushahidi blog, one study alone is insufficient. What’s important is this: the last thing you want to do when trying to discredit a study in public is to come across as sloppy or as having ulterior motives (or both for that matter). Of course, you can’t control what other people think. If people find your response sloppy, then they may start asking whether the other methods you do use in your human rights analysis are properly peer-reviewed. They may start asking whether a strong empirical literature exists to back up your work and models. They may even want to know whether your expert statisticians have an accomplished track record and publish regularly in top-tier scientific journals. Other people may think you have ulterior motives and will believe this explains why you tried to discredit the EC’s preliminary findings. This doesn’t help your cause either. So it’s important to think through the implications of going public when trying to discredit someone’s research. Goodness knows I’ve made some poor calls myself on such matters in the past.
But lets take a step back for a moment. If you’re going to try and discredit research like the EC’s, please make sure you correctly represent the other side’s arguments. Skewing them or fabricating them is unlikely to make you very credible in the debate. For example, the EC study never concluded that Search and Rescue teams should only rely on SMS to save people’s lives. Furthermore, the EC study never claimed that using SMS is preferable over using established data on building density. It’s surely obvious—and you don’t need to demonstrate this statistically—to know that using a detailed map of building locations would provide a far better picture of potentially damaged buildings than crowdsourced SMS data. But what if this map is not available in a timely manner? As you may know, data layers of building density are not very common. Haiti was a good example of how difficult, expensive and time-consuming, the generation of such a detailed inventory is. The authors of the study simply wanted to test whether the SMS spatial pattern matched the damage analysis results, which it does. All they did was propose that SMS patterns could help in structuring the efforts needed for a detailed assessment, especially because SMS data can be received shortly after the event.
So to summarize, no one (I know) has ever claimed that crowdsourced data should replace established methods for information collection and analysis. This has never been an either or argument. And it won’t help your cause to turn it into a black-and-white debate because people familiar with these issues know full well that the world is more complex than the picture you are painting for them. They also know that people who take an either-or approach often do so when they have either run out of genuine arguments or had few to begin with. So none of this will make you look good. In sum, it’s important to (1) accurately reflect the other’s arguments, and (2) steer clear of creating an either-or, polarized debate. I know this isn’t easy to do, I’m guilty myself… on multiple counts.
I’ve got a few more suggestions—hope you don’t mind. They follow from the previous ones. The authors of the EC study never used their preliminary findings to extrapolate to other earthquakes, disasters or contexts. These findings were specific to the Haiti quake and the authors never claimed that their model was globally valid. So why did you extrapolate to human rights analysis when that was never the objective of the EC study? Regardless, this just doesn’t make you look good. I understand that Benetech’s focus is on human rights and not disaster response, but the EC study never sought to undermine your good work in the field of human rights. Indeed, the authors of the study hadn’t even heard of Benetech. So in the future, I would recommend not extrapolating findings from one study and assume they will hold in your own field of expertise or that they even threaten your area of expertise. That just doesn’t make any sense.
There are a few more tips I wanted to share with you. Everyone knows full well that crowdsourced data has important limitations—nobody denies this. But a number of us happen to think that some value can still be derived from crowdsourced data. Even Mr. Moreno-Ocampo, the head of the International Criminal Court (ICC), who I believe you know well, has pointed to the value of crowdsourced data from social media. In an interview with CNN last month, Mr. Moreno-Ocampo emphasized that Libya was the first time that the ICC was able to respond in real time to allegations of atrocities, partially due to social-networking sites such as Facebook. He added that, “this triggered a very quick reaction. The (United Nations) Security Council reacted in a few days; the U.N. General Assembly reacted in a few days. So, now because the court is up and running we can do this immediately,” he said. “I think Libya is a new world. How we manage the new challenge — that’s what we will see now.”
Point is, you can’t control the threats that will emerge or even prevent them, but you do control the way you decide to publicly respond to these threats. So I would recommend using your response as an opportunity to be constructive and demonstrate your good work rather than trying to discredit others and botching things up in the process.
But going back to the ICC and the bit in the Fast Company article about mathematics demonstrating the culpability of the Guatemalan government. Someone who has been following your work closely for years emailed me because they felt somewhat irked by all this. By the way, this is yet another unpleasant consequence of trying to publicly discredit others, new critics of your work will emerge. The critic in questions finds the claim a “little far fetched” re your mathematics demonstrating the culpability of the Guatemalan government. “There already was massive documented evidence of the culpability of the Guatemalan government in the mass killings of people. If there is a contribution from mathematics it is to estimate the number of victims who were never documented. So the idea is that documented cases are just a fraction of total cases and you can estimate the gap between the two. In order to do this estimation, you have to make a number of very strong assumptions, which means that the estimate may very well be unreliable anyway.”
Now, I personally think that’s not what you, Benetech, meant when you spoke with the journalist, cause goodness knows the number of errors that journalists have made writing about Haiti.
In any case, the critic had this to add: “In a court of law, this kind of estimation counts for little. In the latest trial at which Benetech presented their findings, this kind of evidence was specifically rejected. Benetech and others claim that in an earlier trial they nailed Milosevic. But Milosevic was never nailed in the first place—he died before judgment was passed and there was a definite feeling at the time that the trial wasn’t going well. In any case, in a court of law what matters are documented cases, not estimates, so this argument about estimates is really beside the point.”
Now I’m really no expert on any of these issues, so I have no opinion on this case or the statistics or the arguments involved. They may very well be completely wrong, for all I know. I’m not endorsing any of the above statements. I’m simply using them as an illustration of what might happen in the future if you don’t carefully plan your counter-argument before going public. People will take issue and try to discredit you in turn, which can be rather unpleasant.
In conclusion, I would like to remind the Good People at Benetech about what Ushahidi is and isn’t. The Ushahidi platform is not a methodology (as I have already written on iRevolution and the Ushahidi blog). The Ushahidi platform is a mapping tool. The methodology that people choose to use to collect information is entirely up to them. They can use random sampling, controlled surveys, crowdsourcing, or even the methodology used by Benetech. I wonder what the good people at Benetech would say if some of their data were to be visualized on an Ushahidi platform. Would they dismiss the crisis map altogether? And speaking of crisis maps, most Ushahidi maps are not crisis maps. The platform is used in a very wide variety of ways, even to map the best burgers in the US. Is Benetech also going to extrapolate the EC’s findings to burgers?
So to sum up, in case it’s not entirely clear, we know full well that there are important limitations to crowdsourced data in disaster response and have never said that the methodology of crowdsourcing should replace existing methodologies in the human rights space (or any other space for that matter). So please, lets not continue going in circles endlessly.
Now, where do we go from here? Well, I’ve never been a good pen pal, so don’t expect any more letters from me in response to the Good People at Benetech. I think everyone knows that a back and forth would be unproductive and largely a waste of time, not to mention an unnecessary distraction from the good work that we all try to do in the broader community to bring justice, voice and respect to marginalized communities.
Pingback: The Immediacy of the Crowd – The Ushahidi Blog
Let me address some of the concerns raised in this post about Benetech’s human rights work estimating the mortality consequences of conflicts.
On the Fast Company article: we don’t control what Fast Company or any other journalist writes, and we would certainly not use the same language they use. We’re always happy to discuss our work, what we’ve said and what we’ve published.
Patrick Meier reports that a colleague says that total magnitude of the violence in Guatemala during the armed internal conflict is relatively unimportant. We agree, but the total magnitude of killings wasn’t the key finding of our work in Guatemala.
The focus of our work there is on the relative crude mortality rates compared across ethnicities in specific regions and periods. We showed that in six regions, the crude mortality rate (by killing) for indigenous people was five to eight times greater than for non-indigenous people. To clarify the point raised in the post, we were not trying to establish the culpability of the Guatemalan government (the CEH used other evidence for that). We were simply estimating the relative killing-specific crude mortality rates by ethnicity, region, and period.
The disproportionate rates were a key component of the truth commission’s legal finding (combined with evidence from many other methodologies and source types) that genocide was committed by the Army against certain indigenous communities. The work is presented in Vol XII of the CEH report and elsewhere (e.g., Ball, P. “Making the Case: The Role of Statistics in Human Rights Reporting.” Statistical J of the United Nations Economic Commission for Europe. 18(2-3):163-174. 2001).
In this analysis, it is crucial to adjust for selection bias because violence against non-indigenous people (“ladino” people, in the Guatemalan usage) had been reported more frequently and more specifically than violence against indigenous people in these six regions.
How strong the assumptions have to be in order to use the methods we used in Guatemala (and Kosovo, Peru, Timor-Leste, and Colombia) is an complex question in mathematical statistics. The example presented in the Fast Company article is a thought exercise, not the estimator we use in serious work. We present and examine the assumptions in detail in our work on Kosovo and Colombia. See, for example,
Ball, P., W. Betts, F. Scheuren, J. Dudukovich, and J. Asher, “Killings and Refugee Flow in Kosovo, March-June 1999.” Washington, DC: AAAS, 2002, esp. Appendix 1 for matching analysis, and Appendix 2 for examination of the modeling assumptions.
Ball, P., and J. Asher. “Statistics and Slobodan.” Chance. 15(4): 2002.
Lum, Kristian; Price, Megan; Guberek, Tamy; and Ball, Patrick (2010) “Measuring Elusive Populations with Bayesian Model Averaging for Multiple Systems Estimation: A Case Study on Lethal Violations in Casanare, 1998-2007,” Statistics, Politics, and Policy: Vol. 1: Iss. 1, Article 2.
Over the last 10+ years, we’ve checked these models in many contexts, and when possible, validated them against estimates from probability surveys, and in one case, a complete enumeration (that took a decade to complete). In each case, estimates from our method agreed with the answers of the other methods. Other researchers have used the method in Bosnia, and found similar validations. See, for example,
Zwierzchowski, J., and E. Tabeau. “Census based multiple system estimation as an unbiased method of estimation of casualties’ undercount.” Conference Paper for European Population Conference. 2010.
Brunborg, H., T.H. Lyngstad, and H. Urdal, “Accounting for Genocide: How many were killed in Srebrenica?” European J of Population 19(3), 2003.
On Kosovo at the ICTY, Patrick Meier is quite right, Milosevic died and we didn’t get a judgment. We have never said that we “nailed” Milosevic (that’s journalist-speak). Our work in that case (and in the subsequent case) followed the scientific logic outlined long ago by Karl Popper: we rejected two hypotheses (that killing and migration were the result of actions by NATO or the KLA), and we observed a key coincidence in favor of a third hypothesis (that killing and migration were the result of Yugoslav force actions). We did not affirm the third hypothesis — that’s too strong given the nature of the data and the analysis, not to mention that it would violate the Popperian logic of scientific reasoning. See Ball et al. (2002), cited above.
I hope this clarifies any confusion there might be about our work.
Thanks Patrick. Please don’t take offense if I don’t continue this conversation but best of luck with your work, as Erik already said, we’re big fans.
Hi, my name is Megan Price and I am a statistician with the Benetech Human Rights Program. I earned my PhD in Biostatistics from the Rollins School of Public Health at Emory University. My colleague, Kristian Lum, conducted the original analysis published in our blog post. (She’s away for the weekend and can’t post this directly) She says,
Hi. I’m the person who did the original replication of the JRC analysis and published our original blog post. I have a PhD in Statistics from Duke University. My dissertation was on spatial methods, and my PhD advisor, Alan Gelfand, is a prominent spatial statistician.
I explained in my blog post why I didn’t do a spatial analysis beyond my replication of the original analysis. I did not submit this for publication in a peer reviewed journal because nothing that was done in either JRC’s or my own analysis was statistically or methodologically novel.
Given the high level of interest in my initial post, I may follow up with a more sophisticated spatial model. If I do, I will post it on my blog also.
By the way, the Tobit model you mention is appropriate for continuous data, not the count data that is used in this case.
[Now back to Megan] I would like to elaborate on Kristian’s closing comment about the Tobit model as this suggestion intrigues me as well. My understanding is that Tobit models were initially motivated by truncated or censored data in economics (such as spending on luxury items, which may be zero at low income levels, as described in Tobin, J. “Estimation of Relationships for Limited Dependent Variables.” Econometrica. 26(1):24-36. 1958). I have since seen them mentioned in a few applications of spatial analyses (e.g., Berrocal, V.J., Raftery, A., and Gneiting, T. “Probabilistic Quantitative Precipitation Field Forecasting Using a Two-Stage Spatial Model.” The Annals of Applied Statistics. 2(4):1170-1193. 2008. though this example ultimately uses a different model), but these still rely on continuous data. I am unfamiliar with applications of Tobit models in cases using discrete count data like the Haiti analysis – could you elaborate on this suggestion or point us to a useful reference?
Hi Megan, thanks for your note. I didn’t write or conduct the JRC study, so best to check with them. Please don’t take offense if I don’t continue this conversation but best of luck with your work, as Erik already said, we’re big fans.
Patrick Balls says,
“On Kosovo at the ICTY … Our work in that case (and in the subsequent case) followed the scientific logic outlined long ago by Karl Popper: we rejected two hypotheses (that killing and migration were the result of actions by NATO or the KLA)”
I was curious to see how they employed “scientific logic” to reject these hypotheses, so I looked up some of this work. They say,
“NATO and KLA data of reasonably high quality were available to us. Yugoslav press and government sources published information on NATO attacks contemporaneously with the airstrikes, documenting when and where the attacks occurred, tabulating them by municipality and date. Information on KLA activity was obtained from interview accounts and a variety of non-governmental reports summarized and provided to us by the ICTY. Using that information, we counted the number of reported battles between the KLA and Yugoslav forces occurring in each municipality over time. Isolated KLA attacks that resulted in the injury, disappearance, or deaths of ethnic Serbs were also tabulated by the number of reported casualties. We were unable, however, to obtain data on Yugoslav army activity independent of interactions with the KLA.”
This seems rather odd data for Mr. Ball and Benetech to rely on to “reject hypotheses” with “scientific logic” given that they’ve been on something of a kick lately of insisting that such data (so-called “observational data” or “convenience data” in Benetech-speak) is unreliable for any kind of inference about war violence, and is not “scientifically defensible”.
Interested readers can go to some of these links and search for the word “convenience” to find such statements:
Click to access Benetech-Report-to-CAVR.pdf
Click to access Co-union-violence-paper-response.pdf
Just a couple examples from the first link above:
“It is my understanding that IBC is a large convenience sample, not suitable for generating estimates (in the statistical sense). … I think the best response … is not to run toward unrepresentative convenience data” – Megan Price
“In my experience (about 20 years collecting and analyzing human rights violations data), multiple convenience samples about the same context have almost never converged to the same statistical picture on key historical questions.” – Patrick Ball
It seems like some of the Good People at Benetech are pretty inconsistent here. It is OK for them to “run to convenience data” from the Yugoslav media and government to draw an inference that NATO bombing did not cause mass killing or migration in Kosovo. But Benetech then seems to pop up repeatedly to denounce anybody else in other contexts trying to draw any conclusions from event data, much of which is surely better than Ball’s Yugoslav media/government data on NATO bombing.
Ball seems very confident about his Popperian logic in a recent interview:
“Was the migration and mass killing in Kosovo the product of NATO’s bombing, of the Albanian guerrillas, or was it part of a systematic campaign by Yugoslav forces? That’s a critical question of fact, which statistics help us answer in a pretty definitive way. And that changes history forever.” http://www.pbs.org/newshour/bb/science/jan-june11/benetech_03-25.html
Apparently “convenience data” has its uses, enabling Ball to answer his “critical question” “in a pretty definitive way” by drawing inferences about overall violence patterns in Yugoslavia based on media data and events recorded by official government sources, even “changing history forever” with it.
Yet, the Iraq Body Count data, which I’d think has a richer diversity of sources, fewer limitations, and is more transparent than the Kosovo data employed by Ball, apparently can’t be used for drawing rigorous inferences rising to Popperian levels, let alone definitive, history-changing levels.
Crowdsourced data is claimed to be similarly useless apparently, but I’ll leave that debate between the others here.
JoshD: You’re right that the NATO and KLA data are convenience samples; it’s probably impossible to get probability modeled data for either of them. This isn’t really a problem for the NATO data, because I think the Yugoslav government was in a pretty good position to have complete information about NATO airstrikes in Kosovo, especially in the crucial mid-March to mid-April period. However, the KLA data definitely underregister KLA activity. It would be very interesting to perturb the KLA convenience sample to determine how much it would have to be biased in order to affect the conclusions we reached. This is called sensitivity analysis, and we did it for the migration data used in this study. I’ve put revisiting the KLA data in the task queue — thanks for pushing us on addressing the completeness of data in all our analyses!
Pingback: Big Data for Development: Challenges and Opportunities | iRevolution