Tag Archives: analysis

Syria: Crowdsourcing Satellite Imagery Analysis to Identify Mass Human Rights Violations

Update: See this blog post for the latest. Also, our project was just featured on the UK Guardian Blog!

What if we crowdsourced satellite imagery analysis of key cities in Syria to identify evidence of mass human rights violations? This is precisely the question that my colleagues at Amnesty International USA’s Science for Human Rights Program asked me following this pilot project I coordinated for Somalia. AI-USA has done similar work in the past with their Eyes on Darfur project, which I blogged about here in 2008. But using micro-tasking with backend triangulation to crowdsource the analysis of high resolution satellite imagery for human rights purposes is definitely breaking new ground.

A staggering amount of new satellite imagery is produced every day; millions of square kilometers’ worth according to one knowledgeable colleague. This is a big data problem that needs mass human intervention until the software can catch up. I recently spoke with Professor Ryan Engstrom, the Director of the Spatial Analysis Lab at George Washington University, and he confirmed that automated algorithms for satellite imagery analysis still have a long, long way to go. So the answer for now has to be human-driven analysis.

But professional satellite imagery experts who have plenty of time to volunteer their skills are far and few between. The Satellite Sentinel Project (SSP), which I blogged about here, is composed of a very small team and a few interns. Their focus is limited to the Sudan and they are understandably very busy. My colleagues at AI-USA analyze satellite imagery for several conflicts, but this takes them far longer than they’d like and their small team is still constrained given the number of conflicts and vast amounts of imagery that could be analyzed. This explains why they’re interested in crowdsourcing.

Indeed, crowdsourcing imagery analysis has proven to be a workable solution in several other projects & sectors. The “crowd” can indeed scan and tag vast volumes of satellite imagery data when that imagery is “sliced and diced” for micro-tasking. This is what we did for the Somalia pilot project thanks to the Tomnod platform and the imagery provided by Digital Globe. The yellow triangles below denote the “sliced images” that individual volunteers from the Standby Task Force (SBTF) analyzed and tagged one at a time.

We plan do the same with high resolution satellite imagery of three key cities in Syria selected by the AI-USA team. The specific features we will look for and tag include: “Burnt and/or darkened building features,” “Roofs absent,” “Blocks on access roads,” “Military equipment in residential areas,” “Equipment/persons on top of buildings indicating potential sniper positions,” “Shelters composed of different materials than surrounding structures,” etc. SBTF volunteers will be provided with examples of what these features look like from a bird’s eye view and from ground level.

Like the Somalia project, only when a feature—say a missing roof—is tagged identically  by at least 3 volunteers will that location be sent to the AI-USA team for review. In addition, if volunteers are unsure about a particular feature they’re looking at, they’ll take a screenshot of said feature and share it on a dedicated Google Doc for the AI-USA team and other satellite imagery experts from the SBTF team to review. This feedback mechanism is key to ensure accurate tagging and inter-coder reliability. In addition, the screenshots shared will be used to build a larger library of features, i.e., what a missing roof looks like as well military equipment in residential areas, road blocks, etc. Volunteers will also be in touch with the AI-USA team via a dedicated Skype chat.

There will no doubt be a learning curve, but the sooner we climb that learning curve the better. Democratizing satellite imagery analysis is no easy task and one or two individuals have opined that what we’re trying to do can’t be done. That may be, but we won’t know unless we try. This is how innovation happens. We can hypothesize and talk all we want, but concrete results are what ultimately matters. And results are what can help us climb that learning curve. My hope, of course, is that democratizing satellite imagery analysis enables AI-USA to strengthen their advocacy campaigns and makes it harder for perpetrators to commit mass human rights violations.

SBTF volunteers will be carrying out the pilot project this month in collaboration with AI-USA, Tomnod and Digital Globe. How and when the results are shared publicly will be up to the AI-USA team as this will depend on what exactly is found. In the meantime, a big thanks to Digital Globe, Tomnod and SBTF volunteers for supporting the AI-USA team on this initiative.

If you’re interested in reading more about satellite imagery analysis, the following blog posts may also be of interest:

• Geo-Spatial Technologies for Human Rights
• Tracking Genocide by Remote Sensing
• Human Rights 2.0: Eyes on Darfur
• GIS Technology for Genocide Prevention
• Geo-Spatial Analysis for Global Security
• US Calls for UN Aerial Surveillance to Detect Preparations for Attacks
• Will Using ‘Live’ Satellite Imagery to Prevent War in the Sudan Actually Work?
• Satellite Imagery Analysis of Kenya’s Election Violence: Crisis Mapping by Fire
• Crisis Mapping Uganda: Combining Narratives and GIS to Study Genocide
• Crowdsourcing Satellite Imagery Analysis for Somalia: Results of Trial Run
• Genghis Khan, Borneo & Galaxies: Crowdsourcing Satellite Imagery Analysis
• OpenStreetMap’s New Micro-Tasking Platform for Satellite Imagery Tracing




Crowdsourcing Satellite Imagery Analysis for Somalia: Results of Trial Run

We’ve just completed our very first trial run of the Standby Task Volunteer Force (SBTF) Satellite Team. As mentioned in this blog post last week, the UN approached us a couple weeks ago to explore whether basic satellite imagery analysis for Somalia could be crowdsourced using a distributed mechanical turk approach. I had actually floated the idea in this blog post during the floods in Pakistan a year earlier. In any case, a colleague at Digital Globe (DG) read my post on Somalia and said: “Lets do it.”

So I reached out to Luke Barrington at Tomnod to set up distributed micro-tasking platform for Somalia. To learn more about Tomond’s neat technology, see this previous blog post. Within just a few days we had high resolution satellite imagery from DG and a dedicated crowdsourcing platform for imagery analysis, courtesy of Tomnod . All that was missing were some willing and able “mapsters” from the SBTF to tag the location of shelters in this imagery. So I sent out an email to the group and some 50 mapsters signed up within 48 hours. We ran our pilot from August 26th to August 30th. The idea here was to see what would go wrong (and right!) and thus learn as much as we could before doing this for real in the coming weeks.

It is worth emphasizing that the purpose of this trial run (and entire exercise) is not to replicate the kind of advanced and highly-skilled satellite imagery analysis that professionals already carry out.  This is not just about Somalia over the next few weeks and months. This is about Libya, Syria, Yemen, Afghanistan, Iraq, Pakistan, North Korea, Zimbabwe, Burma, etc. Professional satellite imagery experts who have plenty of time to volunteer their skills are far and few between. Meanwhile, a staggering amount of new satellite imagery is produced  every day; millions of square kilometers’ worth according to one knowledgeable colleague.

This is a big data problem that needs mass human intervention until the software can catch up. Moreover, crowdsourcing has proven to be a workable solution in many other projects and sectors. The “crowd” can indeed scan vast volumes of satellite imagery data and tag features of interest. A number of these crowds-ourcing platforms also have built-in quality assurance mechanisms that take into account the reliability of the taggers and tags. Tomnod’s CrowdRank algorithm, for example, only validates imagery analysis if a certain number of users have tagged the same image in exactly the same way. In our case, only shelters that get tagged identically by three SBTF mapsters get their locations sent to experts for review. The point here is not to replace the experts but to take some of the easier (but time-consuming) tasks off their shoulders so they can focus on applying their skill set to the harder stuff vis-a-vis imagery interpretation and analysis.

The purpose of this initial trial run was simply to give SBTF mapsters the chance to test drive the Tomnod platform and to provide feeback both on the technology and the work flows we put together. They were asked to tag a specific type of shelter in the imagery they received via the web-based Tomnod platform:

There’s much that we would do differently in the future but that was exactly the point of the trial run. We had hoped to receive a “crash course” in satellite imagery analysis from the Satellite Sentinel Project (SSP) team but our colleagues had hardly slept in days because of some very important analysis they were doing on the Sudan. So we did the best we could on our own. We do have several satellite imagery experts on the SBTF team though, so their input throughout the process was very helpful.

Our entire work flow along with comments and feedback on the trial run is available in this open and editable Google Doc. You’ll note the pages (and pages) of comments, questions and answers. This is gold and the entire point of the trial run. We definitely welcome additional feedback on our approach from anyone with experience in satellite imagery interpretation and analysis.

The result? SBTF mapsters analyzed a whopping 3,700+ individual images and tagged more than 9,400 shelters in the green-shaded area below. Known as the “Afgooye corridor,” this area marks the road between Mogadishu and Afgooye which, due to displacement from war and famine in the past year, has become one of the largest urban areas in Somalia. [Note, all screen shots come from Tomnod].

Last year, UNHCR used “satellite imaging both to estimate how many people are living there, and to give the corridor a concrete reality. The images of the camps have led the UN’s refugee agency to estimate that the number of people living in the Afgooye Corridor is a staggering 410,000. Previous estimates, in September 2009, had put the number at 366,000” (1).

The yellow rectangles depict the 3,700+ individual images that SBTF volunteers individually analyzed for shelters: And here’s the output of 3 days’ worth of shelter tagging, 9,400+ tags:

Thanks to Tomnod’s CrowdRank algorithm, we were able to analyze consensus between mapsters and pull out the triangulated shelter locations. In total, we get 1,423 confirmed locations for the types of shelters described in our work flows. A first cursory glance at a handful (“random sample”) of these confirmed locations indicate they are spot on. As a next step, we could crowdsource (or SBTF-source, rather) the analysis of just these 1,423 images to triple check consensus. Incidentally, these 1,423 locations could easily be added to Google Earth or a password-protected Ushahidi map.

We’ve learned a lot during this trial run and Luke got really good feedback on how to improve their platform moving forward. The data collected should also help us provide targeted feedback to SBTF mapsters in the coming days so they can further refine their skills. On my end, I should have been a lot more specific and detailed on exactly what types of shelters qualified for tagging. As the Q&A section on the Google Doc shows, many mapsters weren’t exactly sure at first because my original guidelines were simply too vague. So moving forward, it’s clear that we’ll need a far more detailed “code book” with many more examples of the features to look for along with features that do not qualify. A colleague of mine suggested that we set up an interactive, online quiz that takes volunteers through a series of examples of what to tag and not to tag. Only when a volunteer answers all questions correctly do they move on to live tagging. I have no doubt whatsoever that this would significantly increase consensus in subsequent imagery analysis.

Please note: the analysis carried out in this trial run is not for humanitarian organizations or to improve situational awareness, it is simply for testing purposes only. The point was to try something new and in the process work out the kinks so when the UN is ready to provide us with official dedicated tasks we don’t have to scramble and climb the steep learning curve there and then.

In related news, the Humanitarian Open Street Map Team (HOT) provided SBTF mapsters with an introductory course on the OSM platform this past weekend. The HOT team has been working hard since the response to Haiti to develop an OSM Tasking Server that would allow them to micro-task the tracing of satellite imagery. They demo’d the platform to me last week and I’m very excited about this new tool in the OSM ecosystem. As soon as the system is ready for prime time, I’ll get access to the backend again and will write up a blog post specifically on the Tasking Server.

On Genghis Khan, Borneo and Galaxies: Using Crowdsourcing to Analyze Satellite Imagery

My colleague Robert Soden was absolutely right: Tomnod is definitely iRevolution material. This is why I reached out to the group a few days ago to explore the possibility of using their technology to crowdsource the analysis of satellite imagery for Somalia. You can read more about that project here. In this blog post, however, is to highlight the amazing work they’ve been doing with National Geographic in search of Genghis Khan’s tomb.

This “Valley of the Khans Project” represents a new approach to archeology. Together with National Geographic, Tomnod has collected thousands of GeoEye satellite images of the valley and designed a  simple user interface to crowdsource the tagging of roads, rivers and modern or ancient structures they. I signed up to give it a whirl and it was a lot of fun. A short video gives a quick guide on how to recognize different structures and then off you go!

You are assigned the rank “In Training” when you first begin. Once you’ve tagged your first 10 images, you progress to the next rank, which is “Novice 1”. The squares at the bottom left represent the number of individual satellite images you’ve tagged and how many are left. This is a neat game-like console and I wonder if there’s a scoreboard with names, listed ranks and images tagged.

In any case, a National Geographic team in Mongolia use the results to identify the most promising archeological sites. The field team also used Unmanned Areal Vehicles (UAVs) to supplement the satellite imagery analysis. You can learn more about the “Valley of the Khans Project” from this TEDx talk by Tomnod’s Albert Lin. Incidentally, Tomnod also offered their technology to map the damage from the devastating earthquake in New Zealand, earlier this year. But the next project I want to highlight focuses on the forests of Borneo.

I literally just found out about the “EarthWatchers: Planet Patrol” project thanks to Edwin Wisse’s comment on my previous blog post. As Edwin noted, EarthWatchers is indeed very similar to the Somalia initiative I blogged about. The project is “developing the (web)tools for students all over the world to monitor rainforests using updated satellite imagery to provide real time intelligence required to halt illegal deforestation.”

This is a really neat project and I’ve just signed up to participate. EarthWatchers has designed a free and open source platform to make it easy for students to volunteer. When you log into the platform, EarthWatchers gives you a hexagon-shaped area of the Borneo rainforest to monitor and protect using the satellite imagery displayed on the interface.

The platform also provides students with a number of contextual layers, such as road and river networks, to add context to the satellite imagery and create heat-maps of the most vulnerable areas. Forests near roads are more threatened since the logs are easier to transport, for example. In addition, volunteers can compare before-and-after images of their hexagon to better identify any changes. If you detect any worrying changes in your hexagon, you can create an alert that notifies all your friends and neighbors.

An especially neat feature about the interface is that it allows students to network online. For example, you can see who your neighbors in nearby hexagons are and even chat with them thanks to a native chat feature. This is neat because it facilitates collaboration mapping in real time and means you don’t feel alone or isolated as a volunteer. The chat feature helps to builds community.

If you’d like to learn more about this project, I recommend the presentation below by Eduardo Dias.

The third and final project I want to highlight is called Galaxy Zoo. I first came across this awesome example of citizen science in MacroWikinomics—an excellent book written by Don Tapscott and Anthony Williams. The purpose of Galaxy Zoo is to crowdsource the tagging and thus classification of galaxies as either spiral or elliptical. In order to participate, users to take a short tutorial on the basics of galaxy morphology.

While this project began as an experiment of sorts, the initiative is thriving with more than 275,000 users participating and 75 million classifications made. In addition, the data generated has resulted in several peer reviewed publica-tions real scientific discoveries. While the project uses imagery of the stars rather than earth, it really qualifies as a major success story in crowdsourcing the analysis of imagery.

Know of other intriguing applications of crowdsourcing for imagery analysis? If so, please do share in the comments section below.

Analyzing Satellite Imagery of the Somali Crisis Using Crowdsourcing

 Update: results of satellite imagery analysis available here.

You gotta love Twitter. Just two hours after I tweeted the above—in reference to this project—a colleague of mine from the UN who just got back from the Horn of Africa called me up: “Saw your tweet, what’s going on?” The last thing I wanted to was talk about the über frustrating day I’d just had. So he said, “Hey, listen, I’ve got an idea.” He reminded me of this blog post I had written a year ago on “Crowdsourcing the Analysis of Satellite for Disaster Response” and said, “Why not try this for Somalia? We could definitely use that kind of information.” I quickly forgot about my frustrating day.

Here’s the plan. He talks to UNOSAT and Google about acquiring high-resolution satellite imagery for those geographic areas for which they need more information on. A colleague of mine in San Diego just launched his own company to develop mechanical turk & micro tasking solutions for disaster response. He takes this satellite imagery and cuts it into say 50×50 kilometers square images for micro-tasking purposes.

We then develop a web-based interface where volunteers from the Standby Volunteer Task Force (SBTF) sign in and get one high resolution 50×50 km image displayed to them at a time. For each image, they answer the question: “Are there any human shelters discernible in this picture? [Yes/No].” If yes, what would you approximate the population of that shelter to be? [1-20; 21-50; 50-100; 100+].” Additional questions could be added. Note that we’d provide them with guidelines on how to identify human shelters and estimate population figures.

No shelters discernible in this image

Each 50×50 image would get rated by at least 3 volunteers for data triangulation and quality assurance purposes. That is, if 3 volunteers each tag an image as depicting a shelter (or more than one shelter) and each of the 3 volunteers approximate the same population range, then that image would get automatically pushed to an Ushahidi map, automatically turned into a geo-tagged incident report and automatically categorized by the population estimate. One could then filter by population range on the Ushahidi map and click on those reports to see the actual image.

If satellite imagery licensing is an issue, then said images need not be pushed to the Ushahidi map. Only the report including the location of where a shelter has been spotted would be mapped along with the associated population estimate. The satellite imagery would never be released in full, only small bits and pieces of that imagery would be shared with a trusted network of SBTF volunteers. In other words, the 50×50 images could not be reconstituted and patched together because volunteers would not get contiguous 50×50 images. Moreover, volunteers would sign a code of conduct whereby they pledge not to share any of the imagery with anyone else. Because we track which volunteers see which 50×50 images, we could easily trace any leaked 50×50 image back to the volunteer responsible.

Note that for security reasons, we could make the Ushahidi map password protected and have a public version of the map with very limited spatial resolution so that the location of individual shelters would not be discernible.

I’d love to get feedback on this idea from iRevolution readers, so if you have thoughts (including constructive criticisms), please do share in the comments section below.

Analyzing the Libya Crisis Map Data in 3D (Video)

I first blogged about GeoTime exactly two years ago in a blog post entitled “GeoTime: Crisis Mapping in 3D.” The rationale for visualizing geospatial data in 3D very much resonates with me and in my opinion becomes particularly compelling when analyzing crisis mapping data.

This is why I invited my GeoTime colleague Adeel Khamisa to present their platform at the first International Conference on Crisis Mapping (ICCM 2009). Adeel used the Ushahidi-Haiti data to demonstrate the added value of using a 3D approach, which you can watch in the short video below.

Earlier this year, I asked Adeel whether he might be interested in analyzing the Libya Crisis Map data using GeoTime. He was indeed curious and kindly produced the short video below on his preliminary findings.

The above visual overview of the Libya data is really worth watching. I hope that fellow Crisis Mappers will consider making more use of GeoTime in their projects. The platform really is ideal for Crisis Mapping Analysis.

An Open Letter to the Good People at Benetech

Dear Good People at Benetech,

We’re not quite sure why Benetech went out of their way in an effort to discredit ongoing research by the European Commission (EC) that analyzes SMS data crowdsourced during the disaster response to Haiti. Benetech’s area of expertise is in human rights (rather than disaster response), so why go after the EC’s findings, which had nothing to do with human rights?  To our fellow readers who desire context, feel free to read this blog postof mine along with these replies by Benetech’s CEO:

Issues with Crowdsourced Data Part 1
Issues with Crowdsourced Data Part 2

The short version of the debate is this: the EC’s exploratory study found that the spatial pattern of text messages from Mission 4636 in Haiti was positively correlated with building damage in Port-au-Prince. This would suggest that crowdsourced SMS data had statistical value in Haiti—in addition to their value in saving lives. But Benetech’s study shows a negative correlation. That’s basically it. If you’d like to read something a little more spicy though, do peruse this recent Fast Company article, fabulously entitled “How Benetech Slays Monsters with Megabytes and Math.” In any case, that’s the back-story.

So lets return to the Good People at Benetech. I thought I’d offer some of my humble guidance in case you feel threatened again in the future—I do hope you don’t mind and won’t take offense at my unsolicited and certainly imperfect advice. So by all means feel free to ignore everything that follows and focus on the more important work you do in the human rights space.

Next time Benetech wants to try and discredit the findings of a study in some other discipline, I recommend making sure that your own counter-findings are solid. In fact, I would suggest submitting your findings to a respected peer-reviewed journal—preferably one of the top tier scientific journals in your discipline. As you well know, after all, this really is the most objective and rigorous way to assess scientific work. Doing so would bring much more credibility to Benetech’s counter-findings than a couple blog posts.

My reasoning? Benetech prides itself (and rightly so) for carrying out some of the most advanced, cutting-edge quantitative research on patterns of human rights abuses. So if you want to discredit studies like the one carried out by the EC, I would have used this as an opportunity to publicly demonstrate the advanced expertise you have in quantitative analysis. But Benetech decided to use a simple non-spatial model to discredit the EC’s findings. Why use such a simplistic approach? Your response would have been more credible had you used statistical models for spatial point data instead. But granted, had you used more advanced models, you would have found evidence of a positive correlation. So you probably won’t want to read this next bit: a more elaborate “Tobit” correlation analysis actually shows the significance of SMS patterns as an explanatory variable in the spatial distribution of damaged buildings. Oh, and the correlation is (unfortunately) positive.

But that’s really beside the point. As my colleague Erik Hersman just wrote on the Ushahidi blog, one study alone is insufficient. What’s important is this: the last thing you want to do when trying to discredit a study in public is to come across as sloppy or as having ulterior motives (or both for that matter). Of course, you can’t control what other people think. If people find your response sloppy, then they may start asking whether the other methods you do use in your human rights analysis are properly peer-reviewed. They may start asking whether a strong empirical literature exists to back up your work and models. They may even want to know whether your expert statisticians have an accomplished track record and publish regularly in top-tier scientific journals. Other people may think you have ulterior motives and will believe this explains why you tried to discredit the EC’s preliminary findings. This doesn’t help your cause either. So it’s important to think through the implications of going public when trying to discredit someone’s research. Goodness knows I’ve made some poor calls myself on such matters in the past.

But lets take a step back for a moment. If you’re going to try and discredit research like the EC’s, please make sure you correctly represent the other side’s arguments. Skewing them or fabricating them is unlikely to make you very credible in the debate. For example, the EC study never concluded that Search and Rescue teams should only rely on SMS to save people’s lives. Furthermore, the EC study never claimed that using SMS is preferable over using established data on building density. It’s surely obvious—and you don’t need to demonstrate this statistically—to know that using a detailed map of building locations would provide a far better picture of potentially damaged buildings than crowdsourced SMS data. But what if this map is not available in a timely manner? As you may know, data layers of building density are not very common. Haiti was a good example of how difficult, expensive and time-consuming, the generation of such a detailed inventory is. The authors of the study simply wanted to test whether the SMS spatial pattern matched the damage analysis results, which it does. All they did was propose that SMS patterns could help in structuring the efforts needed for a detailed assessment, especially because SMS data can be received shortly after the event.

So to summarize, no one (I know) has ever claimed that crowdsourced data should replace established methods for information collection and analysis. This has never been an either or argument. And it won’t help your cause to turn it into a black-and-white debate because people familiar with these issues know full well that the world is more complex than the picture you are painting for them. They also know that people who take an either-or approach often do so when they have either run out of genuine arguments or had few to begin with. So none of this will make you look good. In sum, it’s important to (1) accurately reflect the other’s arguments, and (2) steer clear of creating an either-or, polarized debate. I know this isn’t easy to do, I’m guilty myself… on multiple counts.

I’ve got a few more suggestions—hope you don’t mind. They follow from the previous ones. The authors of the EC study never used their preliminary findings to extrapolate to other earthquakes, disasters or contexts. These findings were specific to the Haiti quake and the authors never claimed that their model was globally valid. So why did you extrapolate to human rights analysis when that was never the objective of the EC study? Regardless, this just doesn’t make you look good. I understand that Benetech’s focus is on human rights and not disaster response, but the EC study never sought to undermine your good work in the field of human rights. Indeed, the authors of the study hadn’t even heard of Benetech. So in the future, I would recommend not extrapolating findings from one study and assume they will hold in your own field of expertise or that they even threaten your area of expertise. That just doesn’t make any sense.

There are a few more tips I wanted to share with you. Everyone knows full well that crowdsourced data has important limitations—nobody denies this. But a number of us happen to think that some value can still be derived from crowdsourced data. Even Mr. Moreno-Ocampo, the head of the International Criminal Court (ICC), who I believe you know well, has pointed to the value of crowdsourced data from social media. In an interview with CNN last month, Mr. Moreno-Ocampo emphasized that Libya was the first time that the ICC was able to respond in real time to allegations of atrocities, partially due to social-networking sites such as Facebook. He added that, “this triggered a very quick reaction. The (United Nations) Security Council reacted in a few days; the U.N. General Assembly reacted in a few days. So, now because the court is up and running we can do this immediately,” he said. “I think Libya is a new world. How we manage the new challenge — that’s what we will see now.”

Point is, you can’t control the threats that will emerge or even prevent them, but you do control the way you decide to publicly respond to these threats. So I would recommend using your response as an opportunity to be constructive and demonstrate your good work rather than trying to discredit others and botching things up in the process.

But going back to the ICC and the bit in the Fast Company article about mathematics demonstrating the culpability of the Guatemalan government. Someone who has been following your work closely for years emailed me because they felt somewhat irked by all this. By the way, this is yet another unpleasant consequence of trying to publicly discredit others, new critics of your work will emerge. The critic in questions finds the claim a “little far fetched” re your mathematics demonstrating the culpability of the Guatemalan government. “There already was massive documented evidence of the culpability of the Guatemalan government in the mass killings of people. If there is a contribution from mathematics it is to estimate the number of victims who were never documented. So the idea is that documented cases are just a fraction of total cases and you can estimate the gap between the two. In order to do this estimation, you have to make a number of very strong assumptions, which means that the estimate may very well be unreliable anyway.”

Now, I personally think that’s not what you, Benetech, meant when you spoke with the journalist, cause goodness knows the number of errors that journalists have made writing about Haiti.

In any case, the critic had this to add: “In a court of law, this kind of estimation counts for little. In the latest trial at which Benetech presented their findings, this kind of evidence was specifically rejected. Benetech and others claim that in an earlier trial they nailed Milosevic. But Milosevic was never nailed in the first place—he died before judgment was passed and there was a definite feeling at the time that the trial wasn’t going well. In any case, in a court of law what matters are documented cases, not estimates, so this argument about estimates is really beside the point.”

Now I’m really no expert on any of these issues, so I have no opinion on this case or the statistics or the arguments involved. They may very well be completely wrong, for all I know. I’m not endorsing any of the above statements. I’m simply using them as an illustration of what might happen in the future if you don’t carefully plan your counter-argument before going public. People will take issue and try to discredit you in turn, which can be rather unpleasant.

In conclusion, I would like to remind the Good People at Benetech about what Ushahidi is and isn’t. The Ushahidi platform is not a methodology (as I have already written on iRevolution and the Ushahidi blog). The Ushahidi platform is a mapping tool. The methodology that people choose to use to collect information is entirely up to them. They can use random sampling, controlled surveys, crowdsourcing, or even the methodology used by Benetech. I wonder what the good people at Benetech would say if some of their data were to be visualized on an Ushahidi platform. Would they dismiss the crisis map altogether? And speaking of crisis maps, most Ushahidi maps are not crisis maps. The platform is used in a very wide variety of ways, even to map the best burgers in the US. Is Benetech also going to extrapolate the EC’s findings to burgers?

So to sum up, in case it’s not entirely clear, we know full well that there are important limitations to crowdsourced data in disaster response and have never said that the methodology of crowdsourcing should replace existing methodologies in the human rights space (or any other space for that matter). So please, lets not continue going in circles endlessly.

Now, where do we go from here? Well, I’ve never been a good pen pal, so don’t expect any more letters from me in response to the Good People at Benetech. I think everyone knows that a back and forth would be unproductive and largely a waste of time, not to mention an unnecessary distraction from the good work that we all try to do in the broader community to bring justice, voice and respect to marginalized communities.

Sincerely,

Introduction to Digital Origins of Dictatorship and Democracy

Reading Philip Howard’s “Digital Origins of Dictatorship and Democracy” and Evgeny Morozov’s “Net Delusion” back-to-back over a 10-day period in January was quite a trip. The two authors couldn’t possibly be more different in terms of tone, methodology and research design. Howard’s approach is rigorous and balanced. He takes a data-driven, mixed-methods approach that ought to serve as a model for the empirical study of digital activism.

In contrast, Morozov’s approach frequently takes the form of personal attacks, snarky remarks and cheap rhetorical arguments. This regrettably drowns out the important and valid points he does make in some chapters. But what discredits Net Delusion the most lies not in what Morozov writes but in what he hides. To say the book is one-sided would be an understatement. But this has been a common feature of the author’s writings on digital activism, and one of the reasons  I took him to task a couple years ago with my blog posts on anecdote heaven. If you follow that back and forth, you’ll note it ends with personal attacks by Morozov mixed with evasive counter-arguments. For an intelligent and informed critique of Net Delusion, see my colleague Mary Joyce’s blog posts.

In this blog post, I summarize Howard’s introductory chapter. For a summary of his excellent prologue, please see my previous post here.

The introductory chapter to Digital Origins provides a critique of the datasets and methodologies used to study digital activism. Howard notes that the majority of empirical studies, “rely on a few data sources, chiefly the International Telecommunications Union, the World Bank, and the World Resources Institute. Indeed, these organizations often just duplicate each other’s poor quality data. Many researchers rely heavily on this data for their comparative or single-country case studies, rather than collecting original observations or combining data in interesting ways. The same data tables appear over and over again.”

I faced this challenge in my dissertation research. Collecting original data is often a major undertaking. Howard’s book is the culmination of 3-4 years of research supported by important grants and numerous research assistants. Alas, PhD students don’t always get this kind of support. The good news is that Howard and others are sharing their new datasets like the Global Digital Activism Dataset.

In terms of methods, there are limits in the existing literature. As Howard writes,

“Large-scale, quantitative, and cross-sectionalstudies must often collapse fundamentally different political systems—autocracies, democracies, emerging democracies, and crisis states—into afew categories or narrow indices. […] Area studies that focus on one or two countries get at the rich history of technology diffusion and political development, but rarely offer conclusions that can be useful in understanding some of the seemingly intractable political and security crises in other parts of the world.”

Howard thus takes a different approach, particularly in his quantitative analysis, and introduces fuzzy set logic:

“Fuzzy set logic offers general knowledge through the strategy of looking for shared causal conditions across multiple instances of the same outcome—sometimes called ‘selecting on the dependent variable.’ For large-N, quantitative, and variable oriented researchers, this strategy is unacceptable because neither the outcome nor the shared causal conditions vary across the cases. However, the strategy of selecting on the dependent variableis useful when researchers are interested in studying necessary conditions, and very useful when constructing a new theoretically defined population such as ‘Islamic democracy.’

“Perhaps most important, this strategy is most useful when developing theory grounded in the observed, real-world experience of democratization in the Muslim communities ofthe developing world, rather than developing theory by privileging null, hypothetical, and unobserved cases.”

Using original data and this new innovative statistical approach, Howard finds that “technology diffusion has had a crucial causal role in improvements in democratic institutions.”

“I find that technology diffusion has become, in combination with otherfactors, both a necessary and suffi cient cause of democratic transition or entrenchment.”

“Protests and activist movements have led to successful democratic insurgencies, insurgencies that depended on ICTs for the timing and logistics of protest. Sometimes democratic transitions are the outcome, and sometimes the outcome is slight improvement in the behavior of authoritarianstates. Clearly the internet and cell phones have not on their owncaused a single democratic transition, but it is safe to conclude that today, no democratic transition is possible without information technologies.”

My next blog post on Howard’s book will summarize Chapter 1: Evolution and Revolution, Transition and Entrenchment.

Access to Mobile Phones Increases Protests Against Repressive Regimes

I recently shared a draft of my first dissertation chapter which consists of a comprehensive literature review on the impact of Information and Communication Technologies (ICTs) on Democracy, Activism and Dictatorship. Thanks very much to everyone who provided feedback, I really appreciate it. I will try to incorporate as much of the feedback as possible in the final version and will also update that chapter in the coming months given the developments in Tunisia and Egypt.

The second chapter of my dissertation comprises a large-N econometric study on the impact of ICT access on anti-government protests in countries under repressive rule between 1990 and 2007. A 32-page draft of this chapter is available here as a PDF. I use negative binomial regression analysis to test whether the diffusion of ICTs is a statistically significant predictor of protest events and if so, whether that relationship is positive or negative. The dependent variable, protests, is the number of protests per country-year. The ICT variables used in the model are: Internet users, mobile phone subscribers and number of telephone landlines per country-year. The control variables, identified in the literature review are percentage change in GDP, unemployment rate, the degree of autocracy per country-year, internal war and elections.

A total of 38 countries were included in the study: Algeria, Armenia, Azerbaijan, Bahrain, Belarus, Burkina Faso, Burma, China, Cote d’Ivoire, Cuba, DRC, Egypt, Gabon, Guinea, India, Iran, Iraq, Jordan, Kazakhstan, Kenya, Malaysia, Morocco, Pakistan, Philippines, Russia, Saudi Arabia, Singapore, Sudan, Syria, Tajikistan, Thailand, Tunisia, Turkey, Ukraine, United Arab Emirates, Uzbekistan, Venezuela and Zimbabwe. I clustered these countries into 4 groups, those with relatively (1) high and (2) low levels of ICT access; and those with (3) high and (4) low levels of protests per country-year. The purpose of stratifying the data is to capture underlying effects that may be lost by aggregating all the data. So I ran a total of 5 regressions, one on each of those four country groups and one on all the countries combined.

All five negative binomial regression models on the entire 18-year time panel for the study data were significant. Of note, however, is the non-significance of the Internet variable in all models analyzed. Mobile phones were only significant in the regression models for the “Low Protest” and “High Mobile Phone Use” clusters. However, the relationship was negative in the former case and positive in the latter. In other words, an increase in mobile phone users in countries with relatively high ICTs access, is associated with an increase in the number of protests against repressive regimes. This may imply that social unrest is facilitated by the use of mobile communication in countries with widespread access to mobile phones, keeping other factors constant.

These findings require some important qualifications. First, as discussed in the data section, the protest data may suffer from media bias. Second, the protest data does not provide any information on the actual magnitude of the protests. Third, economic data on countries under repressive rule need to be treated with suspicion since some of this data is self-reported. For example, authoritarian regimes are unlikely to report the true magnitude of unemployment in their country. ICT data is also self-reported. Fourth, the data is aggregated to the country-year level, which means potentially important sub-national and sub-annual variations are lost. Fifth and finally, the regression results may be capturing other dynamics that are not immediately apparent given the limits of quantitative analysis.

Qualitative comparative analysis is therefore needed to test and potentially validate the results derived from this quantitative study. Indeed, “perhaps the best reason to proceed in a qualitative and comparative way is that the categories of ‘democracy’ and ‘technology diffusion’ are themselves aggregates and proxies for other measurable phenomena” (Howard 2011). Unpacking and then tracing the underlying causal connections between ICT use and protests requires qualitative methodologies such process-tracing and semi-structured interviews. The conceptual framework developed in Chapter 2 serves as an ideal framework to inform both the process-tracing and interviews. The next chapter of my dissertation will thus introduce two qualitative case studies to critically assess the impact of ICTs on state-society relations in countries under repressive rule. In the meantime, I very much welcome feedback on this second chapter from iRevolution readers.