Category Archives: Crowdsourcing

Twitter, Crises and Early Detection: Why “Small Data” Still Matters

My colleagues John Brownstein and Rumi Chunara at Harvard Univer-sity’s HealthMap project are continuing to break new ground in the field of Digital Disease Detection. Using data obtained from tweets and online news, the team was able to identify a cholera outbreak in Haiti weeks before health officials acknowledged the problem publicly. Meanwhile, my colleagues from UN Global Pulse partnered with Crimson Hexagon to forecast food prices in Indonesia by carrying out sentiment analysis of tweets. I had actually written this blog post on Crimson Hexagon four years ago to explore how the platform could be used for early warning purposes, so I’m thrilled to see this potential realized.

There is a lot that intrigues me about the work that HealthMap and Global Pulse are doing. But one point that really struck me vis-a-vis the former is just how little data was necessary to identify the outbreak. To be sure, not many Haitians are on Twitter and my impression is that most humanitarians have not really taken to Twitter either (I’m not sure about the Haitian Diaspora). This would suggest that accurate, early detection is possible even without Big Data; even with “Small Data” that is neither representative or indeed verified. (Inter-estingly, Rumi notes that the Haiti dataset is actually larger than datasets typically used for this kind of study).

In related news, a recent peer-reviewed study by the European Commi-ssion found that the spatial distribution of crowdsourced text messages (SMS) following the earthquake in Haiti were strongly correlated with building damage. Again, the dataset of text messages was relatively small. And again, this data was neither collected using random sampling (i.e., it was crowdsourced) nor was it verified for accuracy. Yet the analysis of this small dataset still yielded some particularly interesting findings that have important implications for rapid damage detection in post-emergency contexts.

While I’m no expert in econometrics, what these studies suggests to me is that detecting change-over–time is ultimately more critical than having a large-N dataset, let alone one that is obtained via random sampling or even vetted for quality control purposes. That doesn’t mean that the latter factors are not important, it simply means that the outcome of the analysis is relatively less sensitive to these specific variables. Changes in the baseline volume/location of tweets on a given topic appears to be strongly correlated with offline dynamics.

What are the implications for crowdsourced crisis maps and disaster response? Could similar statistical analyses be carried out on Crowdmap data, for example? How small can a dataset be and still yield actionable findings like those mentioned in this blog post?

#UgandaSpeaks: Al-Jazeera uses Ushahidi to Amplify Local Voices in Response to #Kony2012

[Cross-posted from the Ushahidi blog]

Invisible Children’s #Kony2012 campaign has set off a massive firestorm of criticism with the debate likely to continue raging for many more weeks and months. In the meantime, our colleagues at Al-Jazeera have repurposed our previous #SomaliaSpeaks project to amplify Ugandan voices responding to the Kony campaign: #UgandaSpeaks.

Other than GlobalVoices, this Al-Jazeera initiative is one of the very few seeking to amplify local reactions to the Kony campaign. Over 70 local voices have been shared and mapped on Al-Jazeera’s Ushahidi platform in the first few hours since the launch. The majority of reactions submitted thus far are critical of the campaign but a few are positive.

One person from Kampala asks, “How come the world now knows more about #Kony2012 than about the Nodding Syndrome in Northern Uganda?” Another person in Gulu complains that “there is nothing new they are showing us. Its like a campaign against our country. […] Did they put on consideration how much its costing our country’s image? It shows as if Uganda is finished.” In nearby Lira, one person shares their story about growing up in Northern Uganda and attending “St. Mary’s College Aboke, a school from which Joseph Kony’s rebels abducted 139 girls in ordinary level […]. For the 4 years that I spent in that school (1999-2002), together with other students, I remember praying the Rosary at the School Grotto on daily basis and in the process, reading out the names of the 30 girls who had remained in captivity after Sr. Rachelle an Italian Nun together with a Ugandan teacher John Bosco rescued only 109 of them.”

The Ushahidi platform was first launched in neighboring Kenya to give ordinary Kenyans a voice during the post election-violence in 2007/2008. Indeed, “ushahidi” means witness or testimony in Swahili. So I am pleased to see this free and open source platform from Africa being used to amplify voices next door in Uganda, voices that are not represented in the #Kony2012 campaign.

Some Ugandan activists are asking why they should respond to “some American video release about something that happened 20 years ago by someone who is not in my country?” Indeed, why should anyone? If the #Kony2012 campaign and underlying message doesn’t bother Ugandans and doesn’t paint the country in a bad light, then there’s no need to respond. If the campaign doesn’t divert attention from current issues that are more pressing to Ugandans and does not adversely effect tourism, then again, why should anyone respond? This is, after all a personal choice, no one is forced to have their voices heard.

At SXSW yesterday, Ugandan activist Teddy Ruge weighed in on the #Kony2012 campaign with the following:

“We [Ugandans] have such a hard time being given the microphone to talk about our issues that sometimes we have to follow on the coat-tails of Western projects like this one and say that we also have a voice in this matter.”

I believe one way to have those local voices heard is to have them echoed using innovative software “Made in Africa” like Ushahidi and then amplified by a non-Western but international news company like Al-Jazeera. Looking at my Twitter stream this morning, it appears that I’m not the only one. The microphone is yours. Over to you.

Truthiness as Probability: Moving Beyond the True or False Dichotomy when Verifying Social Media

I asked the following question at the Berkman Center’s recent Symposium on Truthiness in Digital Media: “Should we think of truthiness in terms of probabili-ties rather than use a True or False dichotomy?” The wording here is important. The word “truthiness” already suggests a subjective fuzziness around the term. Expressing truthiness as probabilities provides more contextual information than does a binary true or false answer.

When we set out to design the SwiftRiver platform some three years ago, it was already clear to me then that the veracity of crowdsourced information ought to be scored in terms of probabilities. For example, what is the probability that the content of a Tweet referring to the Russian elections is actually true? Why use probabilities? Because it is particularly challenging to instantaneously verify crowdsourced information in the real-time social media world we live in.

There is a common tendency to assume that all unverified information is false until proven otherwise. This is too simplistic, however. We need a fuzzy logic approach to truthiness:

“In contrast with traditional logic theory, where binary sets have two-valued logic: true or false, fuzzy logic variables may have a truth value that ranges in degree between 0 and 1. Fuzzy logic has been extended to handle the concept of partial truth, where the truth value may range between completely true and completely false.”

The majority of user-generated content is unverified at time of birth. (Does said data deserve the “original sin” of being labeled as false, unworthy, until prove otherwise? To digress further, unverified content could be said to have a distinct wave function that enables said data to be both true and false until observed. The act of observation starts the collapse of said wave function. To the astute observer, yes, I’m riffing off Shroedinger’s Cat, and was also pondering how to weave in Heisenberg’s uncertainty principle as an analogy; think of a piece of information characterized by a “probability cloud” of truthiness).

I believe the hard sciences have much to offer in this respect. Why don’t we have error margins for truthiness? Why not take a weather forecast approach to information truthiness in social media? What if we had a truthiness forecast understanding full well that weather forecasts are not always correct? The fact that a 70% chance of rain is forecasted doesn’t prevent us from acting and using that forecast to inform our decision-making. If we applied binary logic to weather forecasts, we’d be left with either a 100% chance of rain or 100% chance of sun. Such weather forecasts would be at best suspect if not wrong rather frequently.

In any case, instead of dismissing content generated in real-time because it is not immediately verifiable, we can draw on Information Forensics to begin assessing the potential validity of said content. Tactics from information forensics can help us create a score card of heuristics to express truthiness in terms of probabilities. (I call this advanced media literacy). There are indeed several factors that one can weigh, e.g., the identity of the messenger relaying the content, the source of the content, the wording of said content, the time of day the information was shared, the geographical proximity of the source to the event being reported, etc.

These weights need not be static as they are largely subjective and temporal; after all, truth is socially constructed and dynamic. So while a “wisdom of the crowds” approach alone may not always be well-suited to generating these weights, perhaps integrating the hunch of the expert coupled with machine learning algorithms (based on lessons learned in information forensics) could result more useful decision-support tools for truthiness forecasting (or rather “backcasting”).

In sum, thinking of truthiness strictly in terms of true and false prevents us from “complexifying” a scalar variable into a vector (a wave function), which in turn limits our ability to develop new intervention strategies. We need new conceptual frameworks to reflect the complexity and ambiguity of user-generated content:

 

Innovation and Counter-Innovation: Digital Resistance in Russia

Want to know what the future of digital activism looks like? Then follow the developments in Russia. I argued a few years back that the fields of digital activism and civil resistance were converging to a point I referred to as  “digital resistance.” The pace of tactical innovation and counter-innovation in Russia’s digital battlefield is stunning and rapidly converging to this notion of digital resistance.

“Crisis can be a fruitful time for innovation,” writes Gregory Asmolov. Contested elections are also ripe for innovation, which is why my dissertation case studies focused on elections. “In most cases,” says Asmolov, “innovations are created by the oppressed (the opposition, in Russia’s case), who try to challenge the existing balance of power by using new tools and technologies. But the state can also adapt and adopt some of these technologies to protect the status quo.” These innovations stem not only from the new technologies themselves but are embodied in the creative ways they are used. In other words, tactical innovation (and counter-innovation) is taking place alongside technological innovation. Indeed, “innovation can be seen not only in the new tools, but also in the new forms of protest enabled by the technology.”

Some of my favorite tactics from Russia include the YouTube video of Vladimir Putin arrested for fraud and corruption. The video was made to look like a real “breaking news” announcement on Russian television. The site got millions of viewers in just a few days. Another tactic is the use of DIY drones, mobile phone live-streaming and/or 360-degree 3D photo installations to more accurately relay the size of protests. A third tactic entails the use of a twitter username that resembles that of a well-known individual. Michael McFaul, the US Ambassador to Russia, has the twitter handle @McFaul. Activists set up the twitter handle @McFauI that appears identical but actually uses a capital “i” instead of a lower case “L” for the last letter in McFaul.

Asmolov lists a number of additional innovations in the Russian context in this excellent write-up. From coordination tools such as the “League of Voters” website, the “Street Art” group on Facebook and the car-based flashmob protests which attracted more than one thousand cars in one case, to the crowdsourced violations map “Karta Narusheniy“, the “SMS Golos” and “Svodny Protocol” platforms used to collect, analyze and/or map reports from trusted election observers (using bounded crowdsourcing).

One of my favorite tactics is the “solo protest.” According to Russian law, “a protest by one person does not require special permission. So activist Olesya Shmagun stood in from of Putin’s office with a poster that read “Putin, go and take part in public debates!” While she was questioned by the police and security service, she was not detained since one-person protests are not illegal. Even though she only caught the attention of several dozen people walking by at the time, she published the story of her protests and a few photos on her LiveJournal blog, which drew considerable attention after being shared on many blogs and media outlets. As Asmolov writes, “this story shows the power of what is known as Manuel Castell’s ‘mass self-communication’. Thanks to the presence of one camera, an offline one-person protest found a way to a [much wider] audience online.”

This innovative tactic lead to another challenge: how to turn a one-person protests into a massive number of one-person protests? So on top of this original innovation came yet another innovation, the Big White Circle action. The dedicated online tool Feb26.ru was developed specifically to coordinate many simultaneous one-person protests. The platform,

“[…] allowed people to check in at locations of their choice on the map of the Garden Ring circle, and showed what locations were already occupied. Unlike other protests, the Big White Circle did not have any organizational committee or a particular leader. The role of the leader was played by a website. The website suffered from DDoS attacks; as a result, it was closed and deleted by the provider; a day later, it was restored.  The practice of creating special dedicated websites for specific protest events is one of the most interesting innovations of the Russian protests. The initial idea belongs to Ilya Klishin, who launched the dec24.ru website (which doesn’t exist anymore) for the big opposition rally that took place in Moscow on December 24, 2011.”

The reason I like this tactic is because it takes a perfectly legal action and simply multiplies it, thus forcing the regime to potentially come up with a new set of laws that will clearly appear absurd and ridiculed by a larger segment of the population.

Citizen-based journalism played a pivotal role by “increasing transparency of the coverage of pro-government rallies.” As Asmolov notes, “Internet users were able to provide much content, including high quality YouTube reports that showed that many of those who took a part in these rallies had been forced or paid to participate, without really having any political stance.” This relates to my earlier blog post, “Wag the Dog, or Why Falsifying Crowdsourced Information Can be a Pain.”

Of course, there is plenty of “counter-innovation” coming from the Kremlin and friends. Take this case of pro-Kremlin activists producing an instructional YouTube video on how to manipulate a crowdsourced election-monitoring platform. In addition, Putin loyalists have adapted some of the same tactics as opposition activists, such as the car-based flash-mob protest. The Russian government also decided to create an online system of their own for election monitoring:

“Following an order from Putin, the state communication company Rostelecom developed a website webvybory2012.ru, which allowed people to follow the majority of the Russian polling stations (some 95,000) online on the day of the March 4 presidential election.  Every polling station was equipped with two cameras: one has to be focused on the ballot box and the other has to give the general picture of the polling station. Once the voting was over, one of the cameras broadcasted the counting of the votes. The cost of this project is at least 13 billion rubles (around $500 million). Many bloggers have criticized this system, claiming that it creates an imitation of transparency, when actually the most common election violations cannot be monitored through webcameras (more detailed analysis can be found here). Despite this, the cameras allowed to spot numerous violations (1, 2).”

From the perspective of digital resistance strategies, this is exactly the kind of reaction you want to provoke from a repressive regime. Force them to decen-tralize, spend hundreds of millions of dollars and hundreds of labor-hours to adopt similar “technologies of liberation” and in the process document voting irregularities on their own websites. In other words, leverage and integrate the regime’s technologies within the election-monitoring ecosystem being created, as this will spawn additional innovation. For example, one Russian activist proposed that this webcam network be complemented by a network of citizen mobile phones. In fact, a group of activists developed a smartphone app that could do just this. “The application Webnablyudatel has a classification of all the violations and makes it possible to instantly share video, photos and reports of violations.”

Putin supporters also made an innovative use of crowdsourcing during the recent elections. “What Putin has done is based on a map of Russia where anyone can submit information about Putin’s good deeds.” Just like pro-Kremlin activists can game pro-democracy crowdsourcing platforms, so can supporters of the opposition game a platform like this Putin map. In addition, activists could have easily created a Crowdmap and called it “What Putin Has Not Done” and crowdsource that map, which no doubt would be far more populated than the original good deed map.

One question that comes to mind is how the regime will deal with disinformation on crowdsourcing platforms they set up? Will they need to hire more supporters to vet the information submitted to said platform? Or will  they close up the reporting and use “bounded crowdsourcing” instead? If so, will they have a communications challenge on their hands in trying to convince that trusted reporters are indeed legitimate? Another question has to do with collective action. Pro-Kremlin activists are already innovating on their own but will this create a collective-action challenge for the Russian government? Take the example of the pro-regime “Putin Alarm Clock” (Budilnikputina.ru) tactic which backfired and even prompted Putin’s chief of elections staff to dismiss the initiative as “a provocation organized by the protestors.”

There has always been an interesting asymmetric dynamic in digital activism, with activists as first-movers innovating under oppression and regimes counter-innovating. How will this asymmetry change as digital activism and civil resistance tactics and strategies increasingly converge? Will repressive regimes be pushed to decentralize their digital resistance innovations in order to keep pace with the distributed pro-democracy innovations springing up? Does innovation require less coordination than counter-innovation? And as Gregory Asmolov concludes in his post-script, how will the future ubiquity of crowd-funding platforms and tools for micro-donations/payments online change digital resistance?

Trails of Trustworthiness in Real-Time Streams

Real-time information channels like Twitter, Facebook and Google have created cascades of information that are becoming increasingly challenging to navigate. “Smart-filters” alone are not the solution since they won’t necessarily help us determine the quality and trustworthiness of the information we receive. I’ve been studying this challenge ever since the idea behind SwiftRiver first emerged several years ago now.

I was thus thrilled to come across a short paper on “Trails of Trustworthiness in Real-Time Streams” which describes a start-up project that aims to provide users with a “system that can maintain trails of trustworthiness propagated through real-time information channels,” which will “enable its educated users to evaluate its provenance, its credibility and the independence of the multiple sources that may provide this information.” The authors, Panagiotis Metaxas and Eni Mustafaraj, kindly cite my paper on “Information Forensics” and also reference SwiftRiver in their conclusion.

The paper argues that studying the tactics that propagandists employ in real life can provide insights and even predict the tricks employed by Web spammers.

“To prove the strength of this relationship between propagandistic and spamming techniques, […] we show that one can, in fact, use anti-propagandistic techniques to discover Web spamming networks. In particular, we demonstrate that when starting from an initial untrustworthy site, backwards propagation of distrust (looking at the graph defined by links pointing to to an untrustworthy site) is a successful approach to finding clusters of spamming, untrustworthy sites. This approach was inspired by the social behavior associated with distrust: in society, recognition of an untrustworthy entity (person, institution, idea, etc) is reason to question the trust- worthiness of those who recommend it. Other entities that are found to strongly support untrustworthy entities become less trustworthy themselves. As in society, distrust is also propagated backwards on the Web graph.”

The authors document that today’s Web spammers are using increasingly sophisticated tricks.

“In cases where there are high stakes, Web spammers’ influence may have important consequences for a whole country. For example, in the 2006 Congressional elections, activists using Google bombs orchestrated an effort to game search engines so that they present information in the search results that was unfavorable to 50 targeted candidates. While this was an operation conducted in the open, spammers prefer to work in secrecy so that their actions are not revealed. So,  revealed and documented the first Twitter bomb, which tried to influence the Massachusetts special elections, show- ing how an Iowa-based political group, hiding its affiliation and profile, was able to serve misinformation a day before the election to more than 60,000 Twitter users that were follow- ing the elections. Very recently we saw an increase in political cybersquatting, a phenomenon we reported in [28]. And even more recently, […] we discovered the existence of Pre-fabricated Twitter factories, an effort to provide collaborators pre-compiled tweets that will attack members of the Media while avoiding detection of automatic spam algorithms from Twitter.

The theoretical foundations for a trustworthiness system:

“Our concept of trustworthiness comes from the epistemology of knowledge. When we believe that some piece of information is trustworthy (e.g., true, or mostly true), we do so for intrinsic and/or extrinsic reasons. Intrinsic reasons are those that we acknowledge because they agree with our own prior experience or belief. Extrinsic reasons are those that we accept because we trust the conveyor of the information. If we have limited information about the conveyor of information, we look for a combination of independent sources that may support the information we receive (e.g., we employ “triangulation” of the information paths). In the design of our system we aim to automatize as much as possible the process of determining the reasons that support the information we receive.”

“We define as trustworthy, information that is deemed reliable enough (i.e., with some probability) to justify action by the receiver in the future. In other words, trustworthiness is observable through actions.”

“The overall trustworthiness of the information we receive is determined by a linear combination of (a) the reputation RZ of the original sender Z, (b) the credibility we associate with the contents of the message itself C(m), and (c) characteristics of the path that the message used to reach us.”

“To compute the trustworthiness of each message from scratch is clearly a huge task. But the research that has been done so far justifies optimism in creating a semi-automatic, personalized tool that will help its users make sense of the information they receive. Clearly, no such system exists right now, but components of our system do exist in some of the popular [real-time information channels]. For a testing and evaluation of our system we plan to use primarily Twitter, but also real-time Google results and Facebook.”

In order to provide trails of trustworthiness in real-time streams, the authors plan to address the following challenges:

•  “Establishment of new metrics that will help evaluate the trustworthiness of information people receive, especially from real-time sources, which may demand immediate attention and action. […] we show that coverage of a wider range of opinions, along with independence of results’ provenance, can enhance the quality of organic search results. We plan to extend this work in the area of real-time information so that it does not rely on post-processing procedures that evaluate quality, but on real-time algorithms that maintain a trail of trustworthiness for every piece of information the user receives.”

• “Monitor the evolving ways in which information reaches users, in particular citizens near election time.”

•  “Establish a personalizable model that captures the parameters involved in the determination of trustworthiness of in- formation in real-time information channels, such as Twitter, extending the work of measuring quality in more static information channels, and by applying machine learning and data mining algorithms. To implement this task, we will design online algorithms that support the determination of quality via the maintenance of trails of trustworthiness that each piece of information carries with it, either explicitly or implicitly. Of particular importance, is that these algorithms should help maintain privacy for the user’s trusting network.”

• “Design algorithms that can detect attacks on [real-time information channels]. For example we can automatically detect bursts of activity re- lated to a subject, source, or non-independent sources. We have already made progress in this area. Recently, we advised and provided data to a group of researchers at Indiana University to help them implement “truthy”, a site that monitors bursty activity on Twitter.  We plan to advance, fine-tune and automate this process. In particular, we will develop algorithms that calculate the trust in an information trail based on a score that is affected by the influence and trustworthiness of the informants.”

In conclusion, the authors “mention that in a month from this writing, Ushahidi […] plans to release SwiftRiver, a platform that ‘enables the filtering and verification of real-time data from channels like Twitter, SMS, Email and RSS feeds’. Several of the features of Swift River seem similar to what we propose, though a major difference appears to be that our design is personalization at the individual user level.”

Indeed, having been involved in SwiftRiver research since early 2009 and currently testing the private beta, there are important similarities and some differences. But one such difference is not personalization. Indeed, Swift allows full personalization at the individual user level.

Another is that we’re hoping to go beyond just text-based information with Swift, i.e., we hope to pull in pictures and video footage (in addition to Tweets, RSS feeds, email, SMS, etc) in order to cross-validate information across media, which we expect will make the falsification of crowdsourced information more challenging, as I argue here. In any case, I very much hope that the system being developed by the authors will be free and open source so that integration might be possible.

A copy of the paper is available here (PDF). I hope to meet the authors at the Berkman Center’s “Truth in Digital Media Symposium” and highly recommend the wiki they’ve put together with additional resources. I’ve added the majority of my research on verification of crowdsourced information to that wiki, such as my 20-page study on “Information Forensics: Five Case Studies on How to Verify Crowdsourced Information from Social Media.”

Imagery and Humanitarian Assistance: Gems, Errors and Omissions

The Center for Technology and National Security Policy based at National Defense University’s Institute for National Strategic Studies just published an 88-page report entitled “Constructive Convergence: Imagery and Humanitarian Assistance.” As noted by the author, “the goal of this paper is to illustrate to the technical community and interested humanitarian users the breadth of the tools and techniques now available for imagery collection, analysis, and distribution, and to provide brief recommendations with suggestions for next steps.” In addition, the report “presents a brief overview of the growing power of imagery, especially from volunteers and victims in disasters, and its place in emergency response. It also highlights an increasing technical convergence between professional and volunteer responders—and its limits.”

The study contains a number of really interesting gems, just a few errors and some surprising omissions. The point of this blog post is not to criticize but rather to provide constructive-and-hopefully-useful feedback should the report be updated in the future.

Lets begin with the important gems, excerpted below.

“The most serious issues overlooked involve liability protections by both the publishers and sources of imagery and its data. As far as our research shows there is no universally adopted Good Samaritan law that can protect volunteers who translate emergency help messages, map them, and distribute that map to response teams in the field.”

Whether a Good Samaritan law could ever realistically be universally adopted remains to be seen, but the point is that all of the official humanitarian data protection standards that I’ve reviewed thus far simply don’t take into account the rise of new digitally-empowered global volunteer networks (let alone the existence of social media). The good news is that some colleagues and I are working with the International Committee of the Red Cross (ICRC) and a consor-tium of major humanitarian organizations to update existing data protection protocols to take some of these new factors into account. This new document will hopefully be made publicly available in October 2012.

“Mobile devices such as tablets and mobile phones are now the primary mode for both collecting and sharing information in a response effort. A January 2011 report published by the Mobile Computing Promotion Consortium of Japan surveyed users of smart phones. Of those who had smart phones, 55 percent used a map application, the third most common application after Web browsing and email.”

I find this absolutely fascinating and thus read the January 2011 report, which is where I found the graphic below.

“The rapid deployment of Cellular on Wheels [COW] is dramatically improving. The Alcatel-Lucent Light Radio is 300 grams (about 10 ounces) and stackable. It also consumes very little power, eliminating large generation and storage requirements. It is capable of operating by solar, wind and/or battery power. Each cube fits into the size of a human hand and is fully integrated with radio processing, antenna, transmission, and software management of frequency. The device can operate on multiple frequencies simultaneously and work with existing infrastructure.”

“In Haiti, USSOUTHCOM found imagery, digital open source maps, and websites that hosted them (such as Ushahidi and OpenStreetMap) to occasionally be of greater value than their own assets.”

“It is recommended that clearly defined and restricted use of specialized #hashtags be implemented using a common crisis taxonomy. For example:

#country + location + emergency code + supplemental data

The above example, if located in Washington, DC, U.S.A., would be published as:

#USAWashingtonDC911Trapped

The specialized use of #hashtags could be implemented in the same cultural manner as 911, 999, and other emergency phone number systems. Metadata using these tags would also be given priority when sent over the Internet through communication networks (landline, broadband Internet, or mobile text or data). Abuse of ratified emergency #hashtag’s would be a prosecutable offense. Implementing such as system could reduce the amount of data that crisis mappers and other response organizations need to monitor and improve the quality of data to be filtered. Other forms of #Hashtags syllabus can also be implemented such as:

#country + location + information code (411) + supplemental data
#country + location + water (H20) + supplemental data
#country + location + Fire (FD) + supplemental data”

I found this very interesting and relevant to this earlier blog post: “Calling 911: What Humanitarians Can Learn from 50 Years of Crowdsourcing.” Perhaps a reference to Tweak the Tweet would have been worthwhile.

I also had not come across some of the platforms used in response to the 2011 earthquake in New Zealand. But the report did an excellent job sharing these.

EQviewer.co.nz

Some errors that need correcting:

Open source mapping tools such as Google Earth use imagery as a foundation for layering field data.”

Google Earth is not an open source tool.

CrisisMappers.net, mentioned earlier, is a group of more than 1,600 volunteers that have been brought together by Patrick Meier and Jen Ziemke. It is the core of collaboration efforts that can be deployed anywhere in the world. CrisisMappers has established workshops and steering committees to set guidelines and standardize functions and capabilities for sites that deliver imagery and layered datasets. This group, which today consists of diverse and talented volunteers from all walks of life, might soon evolve into a professional volunteer organization of trusted capabilities and skill sets and they are worth watching.”

CrisisMappers is not a volunteer network or an organization that deploys in any formal sense of the word. The CrisisMappers website explains what the mission and purpose of this informal network is. The initiative has some 3,500 members.

“Figure 16. How Ushahidi’s Volunteer Standby Task Force was Structured for Libya. Ushahidi’s platform success stems from its use by organized volunteers, each with skill sets that extract data from multiple sources for publication.”

The Standby Volunteer Task Force (SBTF) does not belong to Ushahidi, nor is the SBTF an Ushahidi project. A link to the SBTF website would have been appropriate. Also, the majority of applications of the Ushahidi platform have nothing to do with crises, or the SBTF, or any other large volunteer networks. The SBTF’s original success stems from organized volunteers who where well versed in the Ushahidi platform.

“Ushahidi accepts KML and KMZ if there is an agreement and technical assistance resources are available. An end user cannot on their own manipulate a Ushahidi portal as an individual, nor can external third party groups unless that group has an arrangement with the principal operators of the site. This offers new collaboration going forward. The majority of Ushahidi disaster portals are operated by volunteer organizations and not government agencies.”

The first sentence is unclear. If someone sets up an Ushahidi platform and they have KML/KMZ files that they want to upload, they can go ahead and do so. An end-user can do some manipulation of an Ushahidi portal and can also pull the Ushahidi data into their own platform (via the GeoRSS feed, for example). Thanks to the ESRI-Ushahidi plugin, they can then perform a range of more advanced GIS analysis. In terms of volunteers vs government agencies, indeed, it appears the former is leading the way vis-a-vis innovation.

Finally, below are some omissions and areas that I would have been very interested to learn more about. For some reason, the section on the Ushahidi deployment in New Zealand makes no reference to Ushahidi.

Staying on the topic of the earthquake in Christchurch, I was surprised to see no reference to the Tomnod deployment:

I had also hoped to read more about the use of drones (UAVs) in disaster response since these were used both in Haiti and Japan. What about the rise of DIY drones and balloon mapping? Finally, the report’s reference to Broadband Global Area Network (BGAN) doesn’t provide information on the range of costs associated with using BGANs in disasters.

In conclusion, the report is definitely an important contribution to the field of crisis mapping and should be required reading.

Mobile Technologies, Crisis Mapping & Disaster Response: My Talk at #MWC12

Many thanks to GSMA for their kind invitation to speak at the 2012 World Mobile Congress (MWC12) in Barcelona, Spain. GSMA is formally launching its Disaster Response Program at MWC12 with an inaugural working group. “The Disaster Response programme seeks to understand how mobile operators can most effectively support each other and improve resilience among networks in disaster scenarios, and identify how the mobile industry can best help citizens and humanitarian organisations on the ground following a crisis.” Below is the presentation I plan to give.

When disaster strikes, access to information is equally important as access to food and water. This link between information, disaster response and aid was officially recognized by the Secretary General of the International Federation of the Red Cross and Red Crescent Societies in the 2005 World Disasters Report. Since then, disaster-affected populations have become increasingly digital thanks to the widespread adoption of mobile technologies. Indeed, as a result of these mobile technologies, affected populations are increasingly able to source, share and generate a vast amount of information, which is completely transforming disaster response.

Take the case of Haiti, for example. Within 48 hours of the devastating earthquake that struck Port-au-Prince in 2010, a dedicated SMS short code was set up to crowdsource information on the urgent needs of the disaster-affected population. This would not have been possible without the partnership with Digicel Haiti since they’re the ones who provided the free SMS short code that enabled anyone in Haiti to text in their most urgent needs and location.

This graphic depicts the words that appeared most frequently in the text messages that were received during the first two weeks after the earthquake. Obviously, the original text messages were in Haitian Creole, so  volunteers from the Diaspora translated some 80,000 SMS’s during the entire 3-month operation. From these, the most urgent life-and-death text messages were identified and geo-located as quickly as possible.

The result was a live crisis map of Haiti, which became the most comprehensive and up-to-date information available to the humanitarian community. In fact, one first-responder noted that the live map helped them save hundreds of lives during their search and rescue operations.

Live crisis maps are critical for disaster response because they can provide real-time situational awareness, like this official UN Crisis Map of Libya. Because the UN Office for the Coordination of Humanitarian Affairs (UN OCHA) did not have information management officers in-country when the crisis began to escalate, they turned to the crisis-affected population for real-time information on the rapidly changing situation. Indeed, a lot of local and relevant user-generated content was already being shared via Twitter, Flickr and YouTube.

The result was this crowdsourced social media map which was used not only by OCHA but also by the World Food Program and other humanitarian organiza-tions. Needless to say, the majority of the rich, multi-media content that populated this map was generated thanks to mobile technology.

Humanitarian organizations are not the only groups using mobile technologies and crisis mapping platforms. Indeed, the mainstream media plays an instrumental role following a disaster. Their ability to widely and rapidly disseminate information to disaster affected populations is absolutely critical for disaster response. And they too are turning to live crisis maps to do this. Just a few weeks ago, Al-jazeera launched this live map to document the impact of the snowstorm emergency in the Balkans.

The map became the most viewed page on the Al-jazeera Balkans website for several weeks running, a clear testament to the demand for this type information and medium. This is actually the third time that Al-jazeera has leveraged mobile technologies for crisis mapping. Just two short months ago, we partnered with Al-jazeera to run a similar project in Somalia using an SMS short code.

There is no doubt that access to information is as important as access to food and water. In fact, sometimes information is the only help that can be made available, especially when isolated populations are cut off and beyond the reach of traditional aid. So while we talk of humanitarian aid and food relief, we also need to talk about “information aid” and “information relief”. Indeed, we have a “World Food Program” but we don’t have a “World Information Program” for communicating with disaster-affected populations.

This explains why I very much welcome and applaud the GSMA for launching their Disaster Response Program. It is perfectly clear that telecommunications companies are pivotal to the efforts just described. I thus look forward to collaborating with this new working group and hope that we’ll begin our conver-sations by addressing the pressing need and challenge to provide disaster-affected populations with free “information rations” (i.e., limited but free voice calls and SMS) in the immediate aftermath of a major disaster.

Stranger than Fiction: A Few Words About An Ethical Compass for Crisis Mapping

The good people at the Sudan Sentinel Project (SSP), housed at my former “alma matter,” the Harvard Humanitarian Initiative (HHI), have recently written this curious piece on crisis mapping and the need for an “ethical compass” in this new field. They made absolutely sure that I’d read the piece by directly messaging me via the @CrisisMappers twitter feed. Not to worry, good people, I read your masterpiece. Interestingly enough, it was published the day after my blog post reviewing IOM’s data protection standards.

To be honest, I was actually not going to spend any time writing up a response because the piece says absolutely nothing new and is hardly pro-active. Now, before any one spins and twists my words: the issues they raise are of paramount importance. But if the authors had actually taken the time to speak with their fellow colleagues at HHI, they would know that several of us participated in a brilliant workshop last year which addressed these very issues. Organized by World Vision, the workshop included representatives from the International Committee of the Red Cross (ICRC), Care International, Oxfam GB, UN OCHA, UN Foundation, Standby Volunteer Task Force (SBTF), Ushahidi, the Harvard Humanitarian Initiative (HHI) and obviously Word Vision. There were several data protection experts at this workshop, which made the event one of the most important workshops I attended in all of 2011. So a big thanks again to Phoebe Wynn-Pope at World Vision for organizing.

We discussed in-depth issues surrounding Do No Harm, Informed Consent, Verification, Risk Mitigation, Ownership, Ethics and Communication, Impar-tiality, etc. As expected, the outcome of the workshop was the clear need for data protection standards that are applicable for the new digital context we operate in, i.e., a world of social media, crowdsourcing and volunteer geographical informa-tion. Our colleagues at the ICRC have since taken the lead on drafting protocols relevant to a data 2.0 world in which volunteer networks and disaster-affected communities are increasingly digital. We expect to review this latest draft in the coming weeks (after Oxfam GB has added their comments to the document). Incidentally, the summary report of the workshop organized by World Vision is available here (PDF) and highly recommended. It was also shared on the Crisis Mappers Google Group. By the way, my conversations with Phoebe about these and related issues began at this conference in November 2010, just a month after the SBTF launched.

I should confess the following: one of my personal pet peeves has to do with people stating the total obvious and calling for action but actually doing absolutely nothing else. Talk for talk’s sake just makes it seem like the authors of the article are simply looking for attention. Meanwhile, many of us are working on these new data protection challenges in our own time, as volunteers. And by the way, the SSP project is first and foremost focused on satellite imagery analysis and the Sudan, not on crowdsourcing or on social media. So they’re writing their piece as outsiders and, well, are hence less informed as a result—particularly since they didn’t do their homework.

Their limited knowledge of crisis mapping is blatantly obvious throughout the article. Not only do the authors not reference the World Vision workshop, which HHI itself attended, they also seem rather confused about the term “crisis mappers” which they keep using. This is somewhat unfortunate since the Crisis Mappers Network is an offshoot of HHI. Moreover, SSP participated and spoke at last year’s Crisis Mappers Conference—just a few months ago, in fact. One outcome of this conference was the launch of a dedicated Working Group on Security and Privacy, which will now become two groups, one addressing security issues and the other data protection. This information was shared on the Crisis Mappers Google Group and one of the authors is actually part of the Security Working Group.

To this end, one would have hoped, and indeed expected, that the authors would write a somewhat more informed piece about these issues. At the very least, they really ought to have documented some of the efforts to date in this innovative space. But they didn’t and unfortunately several statements they make in their article are, well… completely false and rather revealing at the same time. (Incidentally, the good people at SSP did their best to disuade the SBTF from launching a Satellite Team on the premise that only experts are qualified to tag satellite imagery; seems like they’re not interested in citizen science even though some experts I’ve spoken to have referred to SSP as citizen science).

In any case, the authors keep on referring to “crisis mappers this” and “crisis mappers that” throughout their article. But who exactly are they referring to? Who knows. On the one hand, there is the International Network of Crisis Mappers, which is a loose, decentralized, and informal network of some 3,500 members and 1,500 organizations spanning 150+ countries. Then there’s the Standby Volunteer Task Force (SBTF), a distributed, global network of 750+ volunteers who partner with established organizations to support live mapping efforts. And then, easily the largest and most decentralized “group” of all, are all those “anonymous” individuals around the world who launch their own maps using whatever technologies they wish and for whatever purposes they want. By the way, to define crisis mapping as mapping highly volatile and dangerous conflict situations is really far from being accurate either. Also, “equating” crisis mapping with crowdsourcing, which the authors seem to do, is further evidence that they are writing about a subject that they have very little understanding of. Crisis mapping is possible without crowdsourcing or social media. Who knew?

Clearly, the authors are confused. They appear to refer to “crisis mappers” as if the group were a legal entity, with funding, staff, administrative support and brick-and-mortar offices. Furthermore, and what the authors don’t seem to realize, is that much of what they write is actually true of the formal professional humanitarian sector vis-a-vis the need for new data protection standards. But the authors have obviously not done their homework, and again, this shows. They are also confused about the term “crisis mapping” when they refer to “crisis mapping data” which is actually nothing other than geo-referenced data. Finally, a number of paragraphs in the article have absolutely nothing to do with crisis mapping even though the authors seem insinuate otherwise. Also, some of the sensationalism that permeates the article is simply unnecessary and poor taste.

The fact of the matter is that the field of crisis mapping is maturing. When Dr. Jennifer Leaning and I co-founded and co-directed HHI’s Program on Crisis Mapping and Early Warning from 2007-2009, the project was very much an exploratory, applied-research program. When Dr. Jen Ziemke and I launched the Crisis Mappers Network in 2009, we were just at the beginning of a new experiment. The field has come a long way since and one of the consequences of rapid innovation is obviously the lack of any how-to-guide or manual. These certainly need to be written and are being written.

So, instead of  stating the obvious, repeating the obvious, calling for the obvious and making embarrassing factual errors in a public article (which, by the way, is also quite revealing of the underlying motives), perhaps the authors could actually have done some research and emailed the Crisis Mappers Google Group. Two of the authors also have my email address; one even has my private phone number; oh, and they could also have DM’d me on Twitter like they just did.

On Crowdsourcing, Crisis Mapping and Data Protection Standards

The International Organization for Migration (IOM) just published their official Data Protection Manual. This report is hugely informative and should be required reading. At the same time, the 150-page report does not mention social media even once. This is perfectly understandable given IOM’s work, but there is no denying that disaster-affected communities are becoming more digitally-enabled—and thus increasingly the source of important, user-generated information. Moreover, it is difficult to ascertain exactly how to apply all of IOM’s Data Protection Principles to this new digital context and the work of the Standby Volunteer Task Force (SBTF).

The IOM Manual recommends that a risk-benefit assessment be conducted prior to data collection. This means weighing the probability of harm against the anticipated benefits and ensuring that the latter significantly outweigh the potential risks. But IOM explains that “the risk–benefit assessment is not a technical evaluation that is valid under all circumstances. Rather, it is a value judgement that often depends on various factors, including, inter alia, the prevailing social, cultural and religious attitudes of the target population group or individual data subject.”

The Manual also states that data collectors should always put themselves in the shoes of the data subject and consider: “How would a reasonable person, in the position of data subject, react to the data collection and data processing practices?” Again, this a value judgment rather than a technical evaluation. Applying this consistently across IOM will no doubt be a challenge.

The IOM Principles, which form the core of the manual, are as follows (keep in mind that they are obviously written with IOM’s mandate explicitly in mind):

1. Lawful & Fair Collection
2. Specified and Legitimate Purpose
3. Data quality
4. Consent
5. Transfer to Third Parties
6. Confidentiality
7. Access and Transparency
8. Data Security
9. Retention of Personal Data
10. Application of the Principles
11. Ownership of Personal Data
12. Oversight, Compliance & Internal Remedies
13. Exceptions

Take the first principle, which states that “Personal data must be obtained by lawful and fair means with the knowledge or consent of the data subject.” What does this mean when the data is self-generated and voluntarily placed in the public domain? This question also applies to a number of other principles including “Consent” and “Confidentiality”. In the section on “Consent”, the manual lists various ways that consent can be acquired. Perhaps the most a propos to our discussion is “Implicit Consent: no oral declaration or written statement is obtained, but the action or inaction of the data subjects un-equivocally indicates voluntary participation in the IOM project.”

Indeed, during the Ushahidi-Haiti Crisis Mapping Project (UHP), a renowned professor and lawyer at The Fletcher School of Law and Diplomacy was consulted to determine whether or not text messages from the disaster-affected community could be added to a public map). This professor stated there was “Implicit Consent” to map these text messages. (Incidentally, experts at Harvard’s Berkman Center were also consulted on this question at the time).

The first IOM principle further stipulates that “communication with data subjects should be encouraged at all stages of the data collection process.” But what if this communication poses a danger to the data subject? The manual further states that “Personal data should be collected in a safe and secure environment and data controllers should take all necessary steps to ensure that individual vulnerabilities and potential risks are not enhanced.” What if data subjects are not in a safe and secure environment but nevertheless voluntarily share potentially important information on social media channels?

Perhaps the only guidance provided by IOM on this question is as follows: “Data controllers should choose the most appropriate method of data collection that will enhance efficiency and protect the confidentiality of the personal data collected.” But again, what if the data subject has already volunteer information with their personal data and placed this information in the public domain?

The third principle, “Data Quality” is obviously key but the steps provided to ensure accuracy are difficult to translate within the context of crowdsourced information from the social media space. The same is true of several IOM Data Protection Principles. But some are certainly applicable with modification. Take the seventh principle on “Access and Transparency” which recommends that complaint procedures should be relatively straightforward so that data subjects can easily request to rectify or delete content previously collected from them.

“Data Security”, the eighth principle, is also directly applicable. For example, data from social media could be classified according the appropriate level of sensitivity and treated accordingly. During the response to the Haiti earthquake, for example, we kept new information on the location of orphans confidential, sharing this only with trusted colleagues in the humanitarian community. “Separating personal data from non-personal data” is another procedure that can (and has) been used in crisis mapping projects. This is for me an absolutely crucial point. Depending on the situation, we need to separate information mana-gement systems that contain data with personal identifiers from crisis mapping platforms. Obviously, the former thus need to be more secure. Encryption is also proposed for data security and applicable to crisis mapping.

The tenth IOM principle, i.e., “The Application of the Principles”, provides additional guidance on how to implement data protection and security. For example, the manual describes three appropriate methods for depersonalizing data: data-coding;  pseudonymization; and anonymization. Each of these could be applied to crisis mapping projects.

To conclude, the IOM Data Protection Manual is an important contribution and some of the principles described therein can be applied to crowdsourcing and crisis mapping. I look forward to folding these into the workflows and standard operating procedures of the SBTF (with guidance from the SBTF’s Advisory Board and other experts). There still remains a gap, however, vis-a-vis those IOM principles that are not easily customizable for the context in which the SBTF operates. There is also an issue vis-a-vis the Terms of Service of many social media platforms with respect to privacy and data protection standards.

This explains why I am actively collaborating with a major humanitarian organi-zation to explore the development of appropriate data protection standards for crowdsourcing crisis information in the context of social media. Many humanitarian organizations are struggling with these exact same issues. Yes, these organizations have long had data privacy and protection protocols in place but these were designed for a world devoid of social media. One major social media company is also looking to revisit its terms of service agreements given the increasing relevance of their platform in humanitarian response. The challenge, for all, will be to strike the right balance between innovation and regulation.

Some Thoughts on Real-Time Awareness for Tech@State

I’ve been invited to present at Tech@State in Washington DC to share some thoughts on the future of real-time awareness. So I thought I’d use my blog to brainstorm and invite feedback from iRevolution readers. The organizers of the event have shared the following questions with me as a way to guide the conver-sation: Where is all of this headed?  What will social media look like in five to ten years and what will we do with all of the data? Knowing that the data stream can only increase in size, what can we do now to prepare and prevent being over-whelmed by the sheer volume of data?

These are big, open-ended questions, and I will only have 5 minutes to share some preliminary thoughts. I shall thus focus on how time-critical crowdsourcing can yield real-time awareness and expand from there.

Two years ago, my good friend and colleague Riley Crane won DARPA’s $40,000 Red Balloon Competition. His team at MIT found the location of 10 weather balloons hidden across the continental US in under 9 hours. The US covers more than 3.7 million square miles and the balloons were barely 8 feet wide. This was truly a needle-in-the-haystack kind of challenge. So how did they do it? They used crowdsourcing and leveraged social media—Twitter in particular—by using a “recursive incentive mechanism” to recruit thousands of volunteers to the cause. This mechanism would basically reward individual participants financially based on how important their contributions were to the location of one or more balloons. The result? Real-time, networked awareness.

Around the same time that Riley and his team celebrated their victory at MIT, another novel crowdsourcing initiative was taking place just a few miles away at The Fletcher School. Hundreds of students were busy combing through social and mainstream media channels for actionable and mappable information on Haiti following the devastating earthquake that had struck Port-au-Prince. This content was then mapped on the Ushahidi-Haiti Crisis Map, providing real-time situational awareness to first responders like the US Coast Guard and US Marine Corps. At the same time, hundreds of volunteers from the Haitian Diaspora were busy translating and geo-coding tens of thousands of text messages from disaster-affected communities in Haiti who were texting in their location & most urgent needs to a dedicated SMS short code. Fletcher School students filtered and mapped the most urgent and actionable of these text messages as well.

One year after Haiti, the United Nation’s Office for the Coordination of Humanitarian Affairs (OCHA) asked the Standby Volunteer Task Force (SBTF) , a global network of 700+ volunteers, for a real-time map of crowdsourced social media information on Libya in order to improve their own situational awareness. Thus was born the Libya Crisis Map.

The result? The Head of OCHA’s Information Services Section at the time sent an email to SBTF volunteers to commend them for their novel efforts. In this email, he wrote:

“Your efforts at tackling a difficult problem have definitely reduced the information overload; sorting through the multitude of signals on the crisis is no easy task. The Task Force has given us an output that is manageable and digestible, which in turn contributes to better situational awareness and decision making.”

These three examples from the US, Haiti and Libya demonstrate what is already possible with time-critical crowdsourcing and social media. So where is all this headed? You may have noted from each of these examples that their success relied on the individual actions of hundreds and sometimes thousands of volunteers. This is primarily because automated solutions to filter and curate the data stream are not yet available (or rather accessible) to the wider public. Indeed, these solutions tend to be proprietary, expensive and/or classified. I thus expect to see free and open source solutions crop up in the near future; solutions that will radically democratize the tools needed to gain shared, real-time awareness.

But automated natural language processing (NLP) and machine learning alone are not likely to succeed, in my opinion. The data stream is actually not a stream, it is a massive torent of non-indexed information, a 24-hour global firehose of real-time, distributed multi-media data that continues to outpace our ability to produce actionable intelligence from this torrential downpour of 0’s and 1’s. To turn this data tsunami into real-time shared awareness will require that our filtering and curation platforms become more automated and collaborative. I believe the key is thus to combine automated solutions with real-time collabora-tive crowdsourcing tools—that is, platforms that enable crowds to collaboratively filter and curate real-time information, in real-time.

Right now, when we comb through Twitter, for example, we do so on our own, sitting behind our laptop, isolated from others who may be seeking to filter the exact same type of content. We need to develop free and open source platforms that allow for the distributed-but-networked, crowdsourced filtering and curation of information in order to democratize the sense-making of the firehose. Only then will the wider public be able to win the equivalent of Red Balloon competitions without needing $40,000 or a degree from MIT.

I’d love to get feedback from readers about what other compelling cases or arguments I should bring up in my presentation tomorrow. So feel free to post some suggestions in the comments section below. Thank you!