Dear Good People at Benetech,
Issues with Crowdsourced Data Part 1
Issues with Crowdsourced Data Part 2
The short version of the debate is this: the EC’s exploratory study found that the spatial pattern of text messages from Mission 4636 in Haiti was positively correlated with building damage in Port-au-Prince. This would suggest that crowdsourced SMS data had statistical value in Haiti—in addition to their value in saving lives. But Benetech’s study shows a negative correlation. That’s basically it. If you’d like to read something a little more spicy though, do peruse this recent Fast Company article, fabulously entitled “How Benetech Slays Monsters with Megabytes and Math.” In any case, that’s the back-story.
So lets return to the Good People at Benetech. I thought I’d offer some of my humble guidance in case you feel threatened again in the future—I do hope you don’t mind and won’t take offense at my unsolicited and certainly imperfect advice. So by all means feel free to ignore everything that follows and focus on the more important work you do in the human rights space.
Next time Benetech wants to try and discredit the findings of a study in some other discipline, I recommend making sure that your own counter-findings are solid. In fact, I would suggest submitting your findings to a respected peer-reviewed journal—preferably one of the top tier scientific journals in your discipline. As you well know, after all, this really is the most objective and rigorous way to assess scientific work. Doing so would bring much more credibility to Benetech’s counter-findings than a couple blog posts.
My reasoning? Benetech prides itself (and rightly so) for carrying out some of the most advanced, cutting-edge quantitative research on patterns of human rights abuses. So if you want to discredit studies like the one carried out by the EC, I would have used this as an opportunity to publicly demonstrate the advanced expertise you have in quantitative analysis. But Benetech decided to use a simple non-spatial model to discredit the EC’s findings. Why use such a simplistic approach? Your response would have been more credible had you used statistical models for spatial point data instead. But granted, had you used more advanced models, you would have found evidence of a positive correlation. So you probably won’t want to read this next bit: a more elaborate “Tobit” correlation analysis actually shows the significance of SMS patterns as an explanatory variable in the spatial distribution of damaged buildings. Oh, and the correlation is (unfortunately) positive.
But that’s really beside the point. As my colleague Erik Hersman just wrote on the Ushahidi blog, one study alone is insufficient. What’s important is this: the last thing you want to do when trying to discredit a study in public is to come across as sloppy or as having ulterior motives (or both for that matter). Of course, you can’t control what other people think. If people find your response sloppy, then they may start asking whether the other methods you do use in your human rights analysis are properly peer-reviewed. They may start asking whether a strong empirical literature exists to back up your work and models. They may even want to know whether your expert statisticians have an accomplished track record and publish regularly in top-tier scientific journals. Other people may think you have ulterior motives and will believe this explains why you tried to discredit the EC’s preliminary findings. This doesn’t help your cause either. So it’s important to think through the implications of going public when trying to discredit someone’s research. Goodness knows I’ve made some poor calls myself on such matters in the past.
But lets take a step back for a moment. If you’re going to try and discredit research like the EC’s, please make sure you correctly represent the other side’s arguments. Skewing them or fabricating them is unlikely to make you very credible in the debate. For example, the EC study never concluded that Search and Rescue teams should only rely on SMS to save people’s lives. Furthermore, the EC study never claimed that using SMS is preferable over using established data on building density. It’s surely obvious—and you don’t need to demonstrate this statistically—to know that using a detailed map of building locations would provide a far better picture of potentially damaged buildings than crowdsourced SMS data. But what if this map is not available in a timely manner? As you may know, data layers of building density are not very common. Haiti was a good example of how difficult, expensive and time-consuming, the generation of such a detailed inventory is. The authors of the study simply wanted to test whether the SMS spatial pattern matched the damage analysis results, which it does. All they did was propose that SMS patterns could help in structuring the efforts needed for a detailed assessment, especially because SMS data can be received shortly after the event.
So to summarize, no one (I know) has ever claimed that crowdsourced data should replace established methods for information collection and analysis. This has never been an either or argument. And it won’t help your cause to turn it into a black-and-white debate because people familiar with these issues know full well that the world is more complex than the picture you are painting for them. They also know that people who take an either-or approach often do so when they have either run out of genuine arguments or had few to begin with. So none of this will make you look good. In sum, it’s important to (1) accurately reflect the other’s arguments, and (2) steer clear of creating an either-or, polarized debate. I know this isn’t easy to do, I’m guilty myself… on multiple counts.
I’ve got a few more suggestions—hope you don’t mind. They follow from the previous ones. The authors of the EC study never used their preliminary findings to extrapolate to other earthquakes, disasters or contexts. These findings were specific to the Haiti quake and the authors never claimed that their model was globally valid. So why did you extrapolate to human rights analysis when that was never the objective of the EC study? Regardless, this just doesn’t make you look good. I understand that Benetech’s focus is on human rights and not disaster response, but the EC study never sought to undermine your good work in the field of human rights. Indeed, the authors of the study hadn’t even heard of Benetech. So in the future, I would recommend not extrapolating findings from one study and assume they will hold in your own field of expertise or that they even threaten your area of expertise. That just doesn’t make any sense.
There are a few more tips I wanted to share with you. Everyone knows full well that crowdsourced data has important limitations—nobody denies this. But a number of us happen to think that some value can still be derived from crowdsourced data. Even Mr. Moreno-Ocampo, the head of the International Criminal Court (ICC), who I believe you know well, has pointed to the value of crowdsourced data from social media. In an interview with CNN last month, Mr. Moreno-Ocampo emphasized that Libya was the first time that the ICC was able to respond in real time to allegations of atrocities, partially due to social-networking sites such as Facebook. He added that, “this triggered a very quick reaction. The (United Nations) Security Council reacted in a few days; the U.N. General Assembly reacted in a few days. So, now because the court is up and running we can do this immediately,” he said. “I think Libya is a new world. How we manage the new challenge — that’s what we will see now.”
Point is, you can’t control the threats that will emerge or even prevent them, but you do control the way you decide to publicly respond to these threats. So I would recommend using your response as an opportunity to be constructive and demonstrate your good work rather than trying to discredit others and botching things up in the process.
But going back to the ICC and the bit in the Fast Company article about mathematics demonstrating the culpability of the Guatemalan government. Someone who has been following your work closely for years emailed me because they felt somewhat irked by all this. By the way, this is yet another unpleasant consequence of trying to publicly discredit others, new critics of your work will emerge. The critic in questions finds the claim a “little far fetched” re your mathematics demonstrating the culpability of the Guatemalan government. “There already was massive documented evidence of the culpability of the Guatemalan government in the mass killings of people. If there is a contribution from mathematics it is to estimate the number of victims who were never documented. So the idea is that documented cases are just a fraction of total cases and you can estimate the gap between the two. In order to do this estimation, you have to make a number of very strong assumptions, which means that the estimate may very well be unreliable anyway.”
Now, I personally think that’s not what you, Benetech, meant when you spoke with the journalist, cause goodness knows the number of errors that journalists have made writing about Haiti.
In any case, the critic had this to add: “In a court of law, this kind of estimation counts for little. In the latest trial at which Benetech presented their findings, this kind of evidence was specifically rejected. Benetech and others claim that in an earlier trial they nailed Milosevic. But Milosevic was never nailed in the first place—he died before judgment was passed and there was a definite feeling at the time that the trial wasn’t going well. In any case, in a court of law what matters are documented cases, not estimates, so this argument about estimates is really beside the point.”
Now I’m really no expert on any of these issues, so I have no opinion on this case or the statistics or the arguments involved. They may very well be completely wrong, for all I know. I’m not endorsing any of the above statements. I’m simply using them as an illustration of what might happen in the future if you don’t carefully plan your counter-argument before going public. People will take issue and try to discredit you in turn, which can be rather unpleasant.
In conclusion, I would like to remind the Good People at Benetech about what Ushahidi is and isn’t. The Ushahidi platform is not a methodology (as I have already written on iRevolution and the Ushahidi blog). The Ushahidi platform is a mapping tool. The methodology that people choose to use to collect information is entirely up to them. They can use random sampling, controlled surveys, crowdsourcing, or even the methodology used by Benetech. I wonder what the good people at Benetech would say if some of their data were to be visualized on an Ushahidi platform. Would they dismiss the crisis map altogether? And speaking of crisis maps, most Ushahidi maps are not crisis maps. The platform is used in a very wide variety of ways, even to map the best burgers in the US. Is Benetech also going to extrapolate the EC’s findings to burgers?
So to sum up, in case it’s not entirely clear, we know full well that there are important limitations to crowdsourced data in disaster response and have never said that the methodology of crowdsourcing should replace existing methodologies in the human rights space (or any other space for that matter). So please, lets not continue going in circles endlessly.
Now, where do we go from here? Well, I’ve never been a good pen pal, so don’t expect any more letters from me in response to the Good People at Benetech. I think everyone knows that a back and forth would be unproductive and largely a waste of time, not to mention an unnecessary distraction from the good work that we all try to do in the broader community to bring justice, voice and respect to marginalized communities.