Category Archives: Social Computing

Analyzing Tweets Posted During Mumbai Terrorist Attacks

Over 1 million unique users posted more than 2.7 million tweets in just 3 days following the triple bomb blasts that struck Mumbai on July 13, 2011. Out of these, over 68,000 tweets were “original tweets” (in contrast to retweets) and related to the bombings. An analysis of these tweets yielded some interesting patterns. (Note that the Ushahidi Map of the bombings captured ~150 reports; more here).

One unique aspect of this study (PDF) is the methodology used to assess the quality of the Twitter dataset. The number of tweets per user was graphed in order to test for a power law distribution. The graph below shows the log distri-bution of the number of tweets per user. The straight lines suggests power law behavior. This finding is in line with previous research done on Twitter. So the authors conclude that the quality of the dataset is comparable to the quality of Twitter datasets used in other peer-reviewed studies.

I find this approach intriguing because Professor Michael Spagat, Dr. Ryan Woodard and I carried out related research on conflict data back in 2006. One fascinating research question that emerges from all this, and which could be applied to twitter datasets, is whether the slope of the power law says anything about the type of conflict/disaster being tweeted about, the expected number of casualties or even the propagation of rumors.  If you’re interested in pursuing this research question (and have worked with power laws before), please do get in touch. In the meantime, I challenge the authors’ suggestion that a power law distribution necessarily says anything about the quality or reliability of the underlying data. Using the casualty data from SyriaTracker (which is also used by USAID in their official crisis maps), my colleague Dr. Ryan Woodard showed that this dataset does not follow a power law distribution—even thought it is one of the most reliable on Syria.

Syria_PL

Moving on to the content analysis of the Mumbai blast tweets:  “The number of URLs and @-mentions in tweets increase during the time of the crisis in com-parison to what researchers have exhibited for normal circumstances.” The table below lists the top 10 URLs shared on Twitter. Inter-estingly, the link to a Google Spreadsheet was amongst the most shared resource. Created by Twitter user Nitin Sagar, the spreadsheet was used to “coordinate relief operation among people. Within hours hundreds of people registered on the sheet via Twitter. People asked for or off ered help on that spreadsheet for many hours.”

The analysis also reveals that “the number of tweets or updates by authority users (those with large number of followers) are very less, i.e., majority of content generated on Twitter during the crisis comes from non authority users.”  In addition, tweets generated by authority users have a high level of retweets. The results also indicate that “the number of tweets generated by people with large follower base (who are generally like government owned accounts, cele-brities, media companies) were very few. Thus, the majority of content generated at the time of crisis was from unknown users. It was also observed that, though the number of posts were less by users with large number of followers, these posts registered high numbers of retweets.”

Rumors related to the blasts also spread through Twitter. For example, rumors began to circulate about a fourth bomb going off. “Some tweets even speci fied locations of 4th blast as Lemington street, Colaba and Charni. Around 500+ tweets and retweets were posted about this.” False rumors about hospital blood banks needing donations were also propagated via Twitter. “They were initiated by a user, @KapoorChetan and around 2,000 tweets and retweets were made regarding this by Twitter users.” The authors of the study believe that such false rumors and can be prevented if credible sources like the mainstream media companies and the government post updates on social media more frequently.

I did a bit of research on this and found that NDTV did use their twitter feed (which has over half-a-million followers) to counter these rumors. For example, “RT @ndtv: Mumbai police: Don’t believe rumours of more bombs. False rumours being spread deliberately.” Journalist Sonal Kalra also acted to counter rumors: “RT @sonalkalra: BBMs about bombs found in Delhi are FALSE. Pls pls don’t spread rumours. #mumbaiblasts.”

In conclusion, the study considers the “privacy threats during the Twitter activity after the blasts. People openly tweeted their phone numbers on social media websites like Twitter, since at such moment of crisis people wished to reach out to help others. But, long after the crisis was over, such posts still remained publicly available on the Internet.” In addition, “people also openly posted their blood group, home address, etc. on Twitter to off er help to victims of the blasts.” The Ushahidi Map also includes personal information. These data privacy and security issues continue to pose major challenges vis-a-vis the use of social media for crisis response.

Bio

See also: Did Terrorists Use Twitter to Increase Situational Awareness? [Link]

Keynote: Next Generation Humanitarian Technology

I’m excited to be giving the Keynote address at the Social Media and Response Management Interface Event (SMARMIE 2013) in New York this morning. A big thank you to the principal driver behind this important event, Chuck Frank, for kindly inviting me to speak. This is my first major keynote since joining QCRI, so I’m thrilled to share what I’ve learned during this time and my vision for the future of humanitarian technology. But I’m even more excited by the selection of speakers and caliber of participants. I’m eager to learn about their latest projects, gain new insights and hopefully create pro-active partnerships moving forward.

You can follow this event via live stream and @smarmieNYC & #smarmie). I  plan to live tweeting the event at @patrickmeier. My slides are available for download here (125MB). Each slide include speaking notes, which may be of interest to folks who are unable to follow via live stream. Feel free to use my slides but strictly for non-commercial purposes and only with direct attribution. I’ll be sure to post the video of my talk on iRevolution when it becomes available. In the meantime, these videos and publications may be of interest. Also, I’ve curated the table of contents below with 60+ links to every project and/or concept referred to in my keynote and slides (in chronological order) so participants and others can revisit these after the conference—and more importantly keep our conver-sations going via Twitter and the comments section of the blog posts. I plan to hire a Research Assistant in the near future to turn these (and other posts) into a series of up-to-date e-books in which I’ll cite and fully credit the most interesting and insightful comments posted on iRevolution.

Social Media Pulse of Planet

http://iRevolution.net/2013/02/02/pulse-of-the-planet
http://iRevolution.net/2013/02/06/the-world-at-night
http://iRevolution.net/2011/04/20/network-witness

Big Crisis Data and Added Value

http://iRevolution.net/2011/06/22/no-data-bad-data

http://iRevolution.net/2012/02/26/mobile-technologies-crisis-mapping-disaster-response

http://iRevolution.net/2012/12/17/debating-tweets-disaster

http://iRevolution.net/2012/07/18/disaster-tweets-for-situational-awareness

http://iRevolution.net/2013/01/11/disaster-resilience-2-0

Standby Task Force (SBTF)

http://blog.standbytaskforce.com

http://iRevolution.net/2010/09/26/crisis-mappers-task-force

Libya Crisis Map

http://blog.standbytaskforce.com/libya-crisis-map-report

http://irevolution.net/2011/03/04/crisis-mapping-libya

http://iRevolution.net/2011/03/08/volunteers-behind-libya-crisis-map

http://iRevolution.net/2011/06/12/im-not-gaddafi-test

Philippines Crisis Map

http://iRevolution.net/2012/12/05/digital-response-to-typhoon-philippines

http://iRevolution.net/2012/12/08/digital-response-typhoon-pablo

http://iRevolution.net/2012/12/06/digital-disaster-response-typhoon

http://iRevolution.net/2012/06/03/geofeedia-for-crisis-mapping

http://iRevolution.net/2013/02/26/crowdflower-for-disaster-response

Digital Humanitarians 

http://www.digitalhumanitarians.com

Human Computation

http://iRevolution.net/2013/01/20/digital-humanitarian-micro-tasking

Human Computation for Disaster Response (submitted for publication)

Syria Crisis Map

http://iRevolution.net/2012/03/25/crisis-mapping-syria

http://iRevolution.net/2012/11/27/usaid-crisis-map-syria

http://iRevolution.net/2012/07/30/collaborative-social-media-analysis

http://iRevolution.net/2012/05/29/state-of-the-art-digital-disease-detection

Hybrid Systems for Disaster Response

http://iRevolution.net/2012/10/21/crowdsourcing-and-advanced-computing

http://iRevolution.net/2012/07/30/twitter-for-humanitarian-cluster

http://iRevolution.net/2013/02/11/update-twitter-dashboard

Credibility of Social Media: Compare to What?

http://iRevolution.net/2013/01/08/disaster-tweets-versus-911-calls

http://iRevolution.net/2010/09/22/911-system

Human Computed Crediblity 

http://iRevolution.net/2012/07/26/truth-and-social-media

http://iRevolution.net/2011/11/29/information-forensics-five-case-studies

http://iRevolution.net/2010/06/30/crowdsourcing-detective

http://iRevolution.net/2012/11/20/verifying-source-credibility

http://iRevolution.net/2012/09/16/accelerating-verification

http://iRevolution.net/2010/09/19/veracity-of-tweets-during-a-major-crisis

http://iRevolution.net/2011/03/26/technology-to-counter-rumors

http://iRevolution.net/2012/03/10/truthiness-as-probability

http://iRevolution.net/2013/01/27/mythbuster-tweets

http://iRevolution.net/2012/10/31/hurricane-sandy

http://iRevolution.net/2012/07/16/crowdsourcing-for-human-rights-monitoring-challenges-and-opportunities-for-information-collection-verification

Verily: Crowdsourced Verification

http://iRevolution.net/2013/02/19/verily-crowdsourcing-evidence

http://iRevolution.net/2011/11/06/time-critical-crowdsourcing

http://iRevolution.net/2012/09/18/six-degrees-verification

http://iRevolution.net/2011/09/26/augmented-reality-crisis-mapping

AI Computed Credibility

http://iRevolution.net/2012/12/03/predicting-credibility

http://iRevolution.net/2012/12/10/ranking-credibility-of-tweets

Future of Humanitarian Tech

http://iRevolution.net/2012/04/17/red-cross-digital-ops

http://iRevolution.net/2012/11/15/live-global-twitter-map

http://iRevolution.net/2013/02/16/crisis-mapping-minority-report

http://iRevolution.net/2012/04/09/humanitarian-future

http://iRevolution.net/2011/08/22/khan-borneo-galaxies

http://iRevolution.net/2010/03/24/games-to-turksource

http://iRevolution.net/2010/07/08/cognitive-surplus

http://iRevolution.net/2010/08/14/crowd-is-always-there

http://iRevolution.net/2011/09/14/crowdsource-crisis-response

http://iRevolution.net/2012/07/04/match-com-for-economic-resilience

http://iRevolution.net/2013/02/27/matchapp-disaster-response-app

http://iRevolution.net/2013/01/07/what-waze-can-teach-us

Policy

http://iRevolution.net/2012/12/04/catch-22

http://iRevolution.net/2012/02/05/iom-data-protection

http://iRevolution.net/2013/01/23/perils-of-crisis-mapping

http://iRevolution.net/2013/02/25/launching-sms-code-of-conduct

http://iRevolution.net/2013/02/26/haiti-lies

http://iRevolution.net/2012/06/04/big-data-philanthropy-for-humanitarian-response

http://iRevolution.net/2012/07/25/become-a-data-donor

Bio

ps. Please let me know if you find any broken links so I can fix them, thank you!

Did Terrorists Use Twitter to Increase Situational Awareness?

Those who are still skeptical about the value of Twitter for real-time situational awareness during a crisis ought to ask why terrorists likely think otherwise. In 2008, terrorists carried out multiple attacks on Mumbai in what many refer to as the worst terrorist incident in Indian history. This study, summarized below, explains how the terrorists in question could have used social media for coor-dination and decision-making purposes.

The study argues that “the situational information which was broadcast through live media and Twitter contributed to the terrorists’ decision making process and, as a result, it enhanced the effectiveness of hand-held weapons to accomplish their terrorist goal.” To be sure, the “sharing of real time situational information on the move can enable the ‘sophisticated usage of the most primitive weapons.'” In sum, “unregulated real time Twitter postings can contribute to increase the level of situation awareness for terrorist groups to make their attack decision.”

According to the study, “an analysis of satellite phone conversations between terrorist commandos in Mumbai and remote handlers in Pakistan shows that the remote handlers in Pakistan were monitoring the situation in Mumbai through live media, and delivered specific and situational attack commands through satellite phones to field terrorists in Mumbai.” These conversations provide “evidence that the Mumbai terrorist groups understood the value of up-to-date situation information during the terrorist operation. […] They under-stood that the loss of information superiority can compromise their operational goal.”

Handler: See, the media is saying that you guys are now in room no. 360 or 361. How did they come to know the room you guys are in?…Is there a camera installed there? Switch off all the lights…If you spot a camera, fire on it…see, they should not know at any cost how many of you are in the hotel, what condition you are in, where you are, things like that… these will compromise your security and also our operation […]

Terrorist: I don’t know how it happened…I can’t see a camera anywhere.

A subsequent phone conversation reveals that “the terrorists group used the web search engine to increase their decision making quality by employing the search engine as a complement to live TV which does not provide detailed information of specific hostages. For instance, to make a decision if they need to kill a hostage who was residing in the Taj hotel, a field attacker reported the identity of a hostage to the remote controller, and a remote controller used a search engine to obtain the detailed information about him.”

Terrorist: He is saying his full name is K.R.Ramamoorthy.

Handler: K.R. Ramamoorthy. Who is he? … A designer … A professor … Yes, yes, I got it …[The caller was doing an internet search on the name, and a results showed up a picture of Ramamoorthy] … Okay, is he wearing glasses? [The caller wanted to match the image on his computer with the man before the terrorists.]

Terrorist: He is not wearing glasses. Hey, … where are your glasses?

Handler: … Is he bald from the front?

Terrorist: Yes, he is bald from the front …

The terrorist group had three specific political agendas: “(1) an anti-India agenda, (2) an anti-Israel and anti-Jewish agenda, and (3) an anti-US and anti-Nato agenda.” A content analysis of 900+ tweets posted during the attacks reveal whether said tweets may have provided situational awareness information in support of these three political goals. The results: 18% of tweets contained “situa-tional information which can be helpful for Mumbai terrorist groups to make an operational decision of achieving their Anti-India political agenda. Also, 11.34% and 4.6% of posts contained operationally sensitive information which may help terrorist groups to make an operational decision of achieving their political goals of Anti-Israel/Anti-Jewish and Anti-US/Anti-Nato respectively.”

In addition, the content analysis found that “Twitter site played a significant role in relaying situational information to the mainstream media, which was monitored by Mumbai terrorists. Therefore, we conclude that the Mumbai Twitter page in-directly contributed to enhancing the situational awareness level of Mumbai terrorists, although we cannot exclude the possibility of its direct contribution as well.”

In conclusion, the study stresses the importance analyzing a terrorist group’s political goals in order to develop an appropriate information control strategy. “Because terrorists’ political goals function as interpretative filters to process situational information, understanding of adversaries’ political goals may reduce costs for security operation teams to monitor and decide which tweets need to be controlled.”

bio

See also: Analyzing Tweets Posted During Mumbai Terrorist Attacks [Link]

Update: Twitter Dashboard for Disaster Response

Project name: Artificial Intelligence for Disaster Response (AIDR). For a more recent update, please click here.

My Crisis Computing Team and I at QCRI have been working hard on the Twitter Dashboard for Disaster Response. We first announced the project on iRevolution last year. The experimental research we’ve carried out since has been particularly insightful vis-a-vis the opportunities and challenges of building such a Dashboard. We’re now using the findings from our empirical research to inform the next phase of the project—namely building the prototype for our humanitarian colleagues to experiment with so we can iterate and improve the platform as we move forward.

KnightDash

Manually processing disaster tweets is becoming increasingly difficult and unrealistic. Over 20 million tweets were posted during Hurricane Sandy, for example. This is the main problem that our Twitter Dashboard aims to solve. There are two ways to manage this challenge of Big (Crisis) Data: Advanced Computing and Human Computation. The former entails the use of machine learning algorithms to automatically tag tweets while the latter involves the use of microtasking, which I often refer to as Smart Crowdsourcing. Our Twitter Dashboard seeks to combine the best of both methodologies.

On the Advanced Computing side, we’ve developed a number of classifiers that automatically identify tweets that:

  • Contain informative content (in contrast to personal messages or information unhelpful for disaster response);
  • Are posted by eye-witnesses (as opposed to 2nd-hand reporting);
  • Include pictures, video footage, mentions from TV/radio
  • Report casualties and infrastructure damage;
  • Relate to people missing, seen and/or found;
  • Communicate caution and advice;
  • Call for help and important needs;
  • Offer help and support.

These classifiers are developed using state-of-the-art machine learning tech-niques. This simply means that we take a Twitter dataset of a disaster, say Hurricane Sandy, and develop clear definitions for “Informative Content,” “Eye-witness accounts,” etc. We use this classification system to tag a random sample of tweets from the dataset (usually 100+ tweets). We then “teach” algorithms to find these different topics in the rest of the dataset. We tweak said algorithms to make them as accurate as possible; much like training a dog new tricks like go-fetch (wink).

fetchball

We’ve found from this research that the classifiers are quite accurate but sensitive to the type of disaster being analyzed and also the country in which said disaster occurs. For example, a set of classifiers developed from tweets posted during Hurricane Sandy tend to be less accurate when applied to tweets posted for New Zealand’s earthquake. Each classifier is developed based on tweets posted during a specific disaster. In other words, while the classifiers can be highly accurate (i.e., tweets are correctly tagged as being damage-related, for example), they only tend to be accurate for the type of disaster they’ve been trained for, e.g., weather-related disasters (tornadoes), earth-related (earth-quakes) and water-related (floods).

So we’ve been busy trying to collect as many Twitter datasets of different disasters as possible, which has been particularly challenging and seriously time-consuming given Twitter’s highly restrictive Terms of Service, which prevents the direct sharing of Twitter datasets—even for humanitarian purposes. This means we’ve had to spend a considerable amount of time re-creating Twitter datasets for past disasters; datasets that other research groups and academics have already crawled and collected. Thank you, Twitter. Clearly, we can’t collect every single tweet for every disaster that has occurred over the past five years or we’ll never get to actually developing the Dashboard.

That said, some of the most interesting Twitter disaster datasets are of recent (and indeed future) disasters. Truth be told, tweets were still largely US-centric before 2010. But the international coverage has since increased, along with the number of new Twitter users, which almost doubled in 2012 alone (more neat stats here). This in part explains why more and more Twitter users actively tweet during disasters. There is also a demonstration effect. That is, the international media coverage of social media use during Hurricane Sandy, for example, is likely to prompt citizens in other countries to replicate this kind of pro-active social media use when disaster knocks on their doors.

So where does this leave us vis-a-vis the Twitter Dashboard for Disaster Response? Simply that a hybrid approach is necessary (see TEDx talk above). That is, the Dashboard we’re developing will have a number of pre-developed classifiers based on as many datasets as we can get our hands on (categorized by disaster type). In addition to that, the dashboard will also allow users to create their own classifiers on the fly by leveraging human computation. They’ll also be able to microtask the creation of new classifiers.

In other words, what they’ll do is this:

  • Enter a search query on the dashboard, e.g., #Sandy.
  • Click on “Create Classifier” for #Sandy.
  • Create a label for the new classifier, e.g., “Animal Rescue”.
  • Tag 50+ #Sandy tweets that convey content about animal rescue.
  • Click “Run Animal Rescue Classifier” on new incoming tweets.

The new classifier will then automatically tag incoming tweets. Of course, the classifier won’t get it completely right. But the beauty here is that the user can “teach” the classifier not to make the same mistakes, which means the classifier continues to learn and improve over time. On the geo-location side of things, it is indeed true that only ~3% of all tweets are geotagged by users. But this figure can be boosted to 30% using full-text geo-coding (as was done the TwitterBeat project). Some believe this figure can be doubled (towards 75%) by applying Google Translate to the full-text geo-coding. The remaining users can be queried via Twitter for their location and that of the events they are reporting.

So that’s where we’re at with the project. Ultimately, we envision these classifiers to be like individual apps that can be used/created, dragged and dropped on an intuitive widget-like dashboard with various data visualization options. As noted in my previous post, everything we’re building will be freely accessible and open source. And of course we hope to include classifiers for other languages beyond English, such as Arabic, Spanish and French. Again, however, this is purely experimental research for the time being; we want to be crystal clear about this in order to manage expectations. There is still much work to be done.

In the meantime, please feel free to get in touch if you have disaster datasets you can contribute to these efforts (we promise not to tell Twitter). If you’ve developed classifiers that you think could be used for disaster response and you’re willing to share them, please also get in touch. If you’d like to join this project and have the required skill sets, then get in touch, we may be able to hire you! Finally, if you’re an interested end-user or want to share some thoughts and suggestions as we embark on this next phase of the project, please do also get in touch. Thank you!

bio

Big Data for Development: From Information to Knowledge Societies?

Unlike analog information, “digital information inherently leaves a trace that can be analyzed (in real-time or later on).” But the “crux of the ‘Big Data’ paradigm is actually not the increasingly large amount of data itself, but its analysis for intelligent decision-making (in this sense, the term ‘Big Data Analysis’ would actually be more fitting than the term ‘Big Data’ by itself).” Martin Hilbert describes this as the “natural next step in the evolution from the ‘Information Age’ & ‘Information Societies’ to ‘Knowledge Societies’ […].”

Hilbert has just published this study on the prospects of Big Data for inter-national development. “From a macro-perspective, it is expected that Big Data informed decision-making will have a similar positive effect on efficiency and productivity as ICT have had during the recent decade.” Hilbert references a 2011 study that concluded the following: “firms that adopted Big Data Analysis have output and productivity that is 5–6 % higher than what would be expected given their other investments and information technology usage.” Can these efficiency gains be brought to the unruly world of international development?

To answer this question, Hilbert introduces the above conceptual framework to “systematically review literature and empirical evidence related to the pre-requisites, opportunities and threats of Big Data Analysis for international development.” Words, Locations, Nature and Behavior are types of data that are becoming increasingly available in large volumes.

“Analyzing comments, searches or online posts [i.e., Words] can produce nearly the same results for statistical inference as household surveys and polls.” For example, “the simple number of Google searches for the word ‘unemployment’ in the U.S. correlates very closely with actual unemployment data from the Bureau of Labor Statistics.” Hilbert argues that the tremendous volume of free textual data makes “the work and time-intensive need for statistical sampling seem almost obsolete.” But while the “large amount of data makes the sampling error irrelevant, this does not automatically make the sample representative.” 

The increasing availability of Location data (via GPS-enabled mobile phones or RFIDs) needs no further explanation. Nature refers to data on natural processes such as temperature and rainfall. Behavior denotes activities that can be captured through digital means, such as user-behavior in multiplayer online games or economic affairs, for example. But “studying digital traces might not automatically give us insights into offline dynamics. Besides these biases in the source, the data-cleaning process of unstructured Big Data frequently introduces additional subjectivity.”

The availability and analysis of Big Data is obviously limited in areas with scant access to tangible hardware infrastructure. This corresponds to the “Infra-structure” variable in Hilbert’s framework. “Generic Services” refers to the production, adoption and adaptation of software products, since these are a “key ingredient for a thriving Big Data environment.” In addition, the exploitation of Big Data also requires “data-savvy managers and analysts and deep analytical talent, as well as capabilities in machine learning and computer science.” This corresponds to “Capacities and Knowledge Skills” in the framework.

The third and final side of the framework represents the types of policies that are necessary to actualize the potential of Big Data for international develop-ment. These policies are divided into those that elicit a Positive Feedback Loops such as financial incentives and those that create regulations such as interoperability, that is, Negative Feedback Loops.

The added value of Big Data Analytics is also dependent on the availability of publicly accessible data, i.e., Open Data. Hilbert estimates that a quarter of US government data could be used for Big Data Analysis if it were made available to the public. There is a clear return on investment in opening up this data. On average, governments with “more than 500 publicly available databases on their open data online portals have 2.5 times the per capita income, and 1.5 times more perceived transparency than their counterparts with less than 500 public databases.” The direction of “causality” here is questionable, however.

Hilbert concludes with a warning. The Big Data paradigm “inevitably creates a new dimension of the digital divide: a divide in the capacity to place the analytic treatment of data at the forefront of informed decision-making. This divide does not only refer to the availability of information, but to intelligent decision-making and therefore to a divide in (data-based) knowledge.” While the advent of Big Data Analysis is certainly not a panacea,”in a world where we desperately need further insights into development dynamics, Big Data Analysis can be an important tool to contribute to our understanding of and improve our contributions to manifold development challenges.”

I am troubled by the study’s assumption that we live in a Newtonian world of decision-making in which for every action there is an automatic equal and opposite reaction. The fact of the matter is that the vast majority of development policies and decisions are not based on empirical evidence. Indeed, rigorous evidence-based policy-making and interventions are still very much the exception rather than the rule in international development. Why? “Account-ability is often the unhappy byproduct rather than desirable outcome of innovative analytics. Greater accountability makes people nervous” (Harvard 2013). Moreover, response is always political. But Big Data Analysis runs the risk de-politicize a problem. As Alex de Waal noted over 15 years ago, “one universal tendency stands out: technical solutions are promoted at the expense of political ones.” I hinted at this concern when I first blogged about the UN Global Pulse back in 2009.

In sum, James Scott (one of my heroes) puts it best in his latest book:

“Applying scientific laws and quantitative measurement to most social problems would, modernists believed, eliminate the sterile debates once the ‘facts’ were known. […] There are, on this account, facts (usually numerical) that require no interpretation. Reliance on such facts should reduce the destructive play of narratives, sentiment, prejudices, habits, hyperbole and emotion generally in public life. […] Both the passions and the interests would be replaced by neutral, technical judgment. […] This aspiration was seen as a new ‘civilizing project.’ The reformist, cerebral Progressives in early twentieth-century American and, oddly enough, Lenin as well believed that objective scientific knowledge would allow the ‘administration of things’ to largely replace politics. Their gospel of efficiency, technical training and engineering solutions implied a world directed by a trained, rational, and professional managerial elite. […].”

“Beneath this appearance, of course, cost-benefit analysis is deeply political. Its politics are buried deep in the techniques […] how to measure it, in what scale to use, […] in how observations are translated into numerical values, and in how these numerical values are used in decision making. While fending off charges of bias or favoritism, such techniques […] succeed brilliantly in entrenching a political agenda at the level of procedures and conventions of calculation that is doubly opaque and inaccessible. […] Charged with bias, the official can claim, with some truth, that ‘I am just cranking the handle” of a nonpolitical decision-making machine.”

See also:

  • Big Data for Development: Challenges and Opportunities [Link]
  • Beware the Big Errors of Big Data (by Nassim Taleb) [Link]
  • How to Build Resilience Through Big Data [Link]

Using #Mythbuster Tweets to Tackle Rumors During Disasters

The massive floods that swept through Queensland, Australia in 2010/2011 put an area almost twice the size of the United Kingdom under water. And now, a year later, Queensland braces itself for even worse flooding:

Screen Shot 2013-01-26 at 11.38.38 PM

More than 35,000 tweets with the hashtag #qldfloods were posted during the height of the flooding (January 10-16, 2011). One of the most active Twitter accounts belonged to the Queensland Police Service Media Unit: @QPSMedia. Tweets from (and to) the Unit were “overwhelmingly focussed on providing situational information and advice” (1). Moreover, tweets between @QPSMedia and followers were “topical and to the point, significantly involving directly affected local residents” (2). @QPSMedia also “introduced innovations such as the #Mythbuster series of tweets, which aimed to intervene in the spread of rumor and disinformation” (3).

rockhampton floods 2011

On the evening of January 11, @QPSMedia began to post a series of tweets with #Mythbuster in direct response to rumors and misinformation circulating on Twitter. Along with official notices to evacuate, these #Mythbuster tweets were the most widely retweeted @QPSMedia messages.” They were especially successful. Here is a sample: “#mythbuster: Wivenhoe Dam is NOT about to collapse! #qldfloods”; “#mythbuster: There is currently NO fuel shortage in Brisbane. #qldfloods.”

Screen Shot 2013-01-27 at 12.19.03 AM

This kind of pro-active intervention reminds me of the #fakesandy hashtag used during Hurricane Sandy and FEMA’s rumor control initiative during Hurricane Sandy. I expect to see greater use of this approach by professional emergency responders in future disasters. There’s no doubt that @QPSMedia will provide this service again with the coming floods and it appears that @QLDonline is already doing so (above tweet). Brisbane’s City Council has also launched this Crowdmap marking latest road closures, flood areas and sandbag locations. Hoping everyone in Queensland stays safe!

In the meantime, here are some relevant statistics on the crisis tweets posted during the 2010/2011 floods in Queensland:

  • 50-60% of #qldfloods messages were retweets (passing along existing messages, and thereby  making them more visible); 30-40% of messages contained links to further information elsewhere on the Web.
  • During the crisis, a number of Twitter users dedicated themselves almost exclusively to retweeting #qldfloods messages, acting as amplifiers of emergency information and thereby increasing its reach.
  • #qldfloods tweets largely managed to stay on topic and focussed predominantly on sharing directly relevant situational information, advice, news media and multimedia reports.
  • Emergency services and media organisations were amongst the most visible participants in #qldfloods, especially also because of the widespread retweeting of their messages.
  • More than one in every five shared links in the #qldfloods dataset was to an image hosted on one of several image-sharing services; and users overwhelmingly depended on Twitpic and other Twitter-centric image-sharing services to upload and distribute the photographs taken on their smartphones and digital cameras
  • The tenor of tweets during the latter days of the immediate crisis shifted more strongly towards organising volunteering and fundraising efforts: tweets containing situational information and advice, and news media and multimedia links were retweeted disproportionately often.
  • Less topical tweets were far less likely to be retweeted.

Social Network Analysis for Digital Humanitarian Response

Monitoring social media for digital humanitarian response can be a massive undertaking. The sheer volume and velocity of tweets generated during a disaster makes real-time social media monitoring particularly challenging if not near impossible. However, two new studies argue that there is “a better way to track the spread of information on Twitter that is much more powerful.”

Twitter-Hadoop31

Manuel Garcia-Herranz and his team at the Autonomous University of Madrid in Spain use small groups of “highly connected Twitter users as ‘sensors’ to detect the emergence of new ideas. They point out that this works because highly co-nnected individuals are more likely to receive new ideas before ordinary users.” The test their hypothesis, the team studied 40 million Twitters users who “together totted up 1.5 billion follows’ and sent nearly half a billion tweets, including 67 million containing hashtags.”

They found that small groups of highly connected Twitter users detect “new hashtags about seven days earlier than the control group.  In fact, the lead time varied between nothing at all and as much as 20 days.” Manuel and his team thus argue that “there’s no point in crunching these huge data sets. You’re far better off picking a decent sensor group and watching them instead.” In other words, “your friends could act as an early warning system, not just for gossip, but for civil unrest and even outbreaks of disease.”

The second study, “Identifying and Characterizing User Communities on Twitter during Crisis Events,” (PDF) is authored by Aditi Gupta et al. Aditi and her co-lleagues analyzed three major crisis events (Hurricane Irene, Riots in England and Earthquake in Virginia) to “to identify the different user communities, and characterize them by the top central users.” Their findings are in line with those shared by the team in Madrid. “[T]he top users represent the topics and opinions of all the users in the community with 81% accuracy on an average.” In sum, “to understand a community, we need to monitor and analyze only these top users rather than all the users in a community.”

How could these findings be used to prioritize the monitoring of social media during disasters? See this blog post for more on the use of social network analysis (SNA) for humanitarian response.

The Problem with Crisis Informatics Research

My colleague ChaTo at QCRI recently shared some interesting thoughts on the challenges of crisis informatics research vis-a-vis Twitter as a source of real-time data. The way he drew out the issue was clear, concise and informative. So I’ve replicated his diagram below.

ChaTo Diagram

What Emergency Managers Need: Those actionable tweets that provide situational awareness relevant to decision-making. What People Tweet: Those tweets posted during a crisis which are freely available via Twitter’s API (which is a very small fraction of the Twitter Firehose). What Computers Can Do: The computational ability of today’s algorithms to parse and analyze natural language at a large scale.

A: The small fraction of tweets containing valuable information for emergency responders that computer systems are able to extract automatically.
B: Tweets that are relevant to disaster response but are not able to be analyzed in real-time by existing algorithms due to computational challenges (e.g. data processing is too intensive, or requires artificial intelligence systems that do not exist yet).
C: Tweets that can be analyzed by current computing systems, but do not meet the needs of emergency managers.
D: Tweets that, if they existed, could be analyzed by current computing systems, and would be very valuable for emergency responders—but people do not write such tweets.

These limitations are not just academic. They make it more challenging to develop next-generation humanitarian technologies. So one question that naturally arises is this: How can we expand the size of A? One way is for governments to implement policies that expand access to mobile phones and the Internet, for example.

Area C is where the vast majority of social media companies operate today, on collecting business intelligence and sentiment analysis for private sector companies by combining natural language processing and machine learning methodologies. But this analysis rarely focuses on tweets posted during a major humanitarian crisis. Reaching out to these companies to let them know they could make a difference during disasters would help to expand the size of A + C.

Finally, Area D is composed of information that would be very valuable for emergency responders, and that could automatically extracted from tweets, but that Twitter users are simply not posting this kind of information during emergencies (for now). Here, government and humanitarian organizations can develop policies to incentivise disaster-affected communities to tweet about the impact of a hazard and resulting needs in a way that is actionable, for example. This is what the Philippine Government did during Typhoon Pablo.

Now recall that the circle “What People Tweet About” is actually a very small fraction of all posted tweets. The advantage of this small sample of tweets is that they are freely available via Twitter’s API. But said API limits the number of downloadable tweets to just a few thousand per day. (For comparative purposes, there were over 20 million tweets posted during Hurricane Sandy). Hence the need for data philanthropy for humanitarian response.

I would be grateful for your feedback on these ideas and the conceptual frame-work proposed by ChaTo. The point to remember, as noted in this earlier post, is that today’s challenges are not static; they can be addressed and overcome to various degrees. In other words, the sizes of the circles can and will change.

 

 

How to Create Resilience Through Big Data

Revised! I have edited this article several dozen times since posting the initial draft. I have also made a number of substantial changes to the flow of the article after discovering new connections, synergies and insights. In addition, I  have greatly benefited from reader feedback as well as the very rich conversa-tions that took place during the PopTech & Rockefeller workshop—a warm thank you to all participants for their important questions and feedback!

Introduction

I’ve been invited by PopTech and the Rockefeller Foundation to give the opening remarks at an upcoming event on interdisciplinary dimensions of resilience, which is  being hosted at Georgetown University. This event is connected to their new program focus on “Creating Resilience Through Big Data.” I’m absolutely de-lighted to be involved and am very much looking forward to the conversations. The purpose of this blog post is to summarize the presentation I intend to give and to solicit feedback from readers. So please feel free to use the comments section below to share your thoughts. My focus is primarily on disaster resilience. Why? Because understanding how to bolster resilience to extreme events will provide insights on how to also manage less extreme events, while the converse may not be true.

Big Data Resilience

terminology

One of the guiding questions for the meeting is this: “How do you understand resilience conceptually at present?” First, discourse matters.  The term resilience is important because it focuses not on us, the development and disaster response community, but rather on local at-risk communities. While “vulnerability” and “fragility” were used in past discourse, these terms focus on the negative and seem to invoke the need for external protection, overlooking the fact that many local coping mechanisms do exist. From the perspective of this top-down approach, international organizations are the rescuers and aid does not arrive until these institutions mobilize.

In contrast, the term resilience suggests radical self-sufficiency, and self-sufficiency implies a degree of autonomy; self-dependence rather than depen-dence on an external entity that may or may not arrive, that may or may not be effective, and that may or may not stay the course. The term “antifragile” just recently introduced by Nassim Taleb also appeals to me. Antifragile sys-tems thrive on disruption. But lets stick with the term resilience as anti-fragility will be the subject of a future blog post, i.e., I first need to finish reading Nassim’s book! I personally subscribe to the following definition of resilience: the capacity for self-organization; and shall expand on this shortly.

(See the Epilogue at the end of this blog post on political versus technical defini-tions of resilience and the role of the so-called “expert”. And keep in mind that poverty, cancer, terrorism etc., are also resilient systems. Hint: we have much to learn from pernicious resilience and the organizational & collective action models that render those systems so resilient. In their book on resilience, Andrew Zolli and Ann Marie Healy note the strong similarities between Al-Qaeda & tuber-culosis, one of which are the two systems’ ability to regulate their metabolism).

Hazards vs Disasters

In the meantime, I first began to study the notion of resilience from the context of complex systems and in particular the field of ecology, which defines resilience as “the capacity of an ecosystem to respond to a perturbation or disturbance by resisting damage and recovering quickly.” Now lets unpack this notion of perturbation. There is a subtle but fundamental difference between disasters (processes) and hazards (events); a distinction that Jean-Jacques Rousseau first articulated in 1755 when Portugal was shaken by an earthquake. In a letter to Voltaire one year later, Rousseau notes that, “nature had not built [process] the houses which collapsed and suggested that Lisbon’s high population density [process] contributed to the toll” (1). In other words, natural events are hazards and exogenous while disas-ters are the result of endogenous social processes. As Rousseau added in his note to Voltaire, “an earthquake occurring in wilderness would not be important to society” (2). That is, a hazard need not turn to disaster since the latter is strictly a product or calculus of social processes (structural violence).

And so, while disasters were traditionally perceived as “sudden and short lived events, there is now a tendency to look upon disasters in African countries in particular, as continuous processes of gradual deterioration and growing vulnerability,” which has important “implications on the way the response to disasters ought to be made” (3). (Strictly speaking, the technical difference between events and processes is one of scale, both temporal and spatial, but that need not distract us here). This shift towards disasters as processes is particularly profound for the creation of resilience, not least through Big Data. To under-stand why requires a basic introduction to complex systems.

complex systems

All complex systems tend to veer towards critical change. This is explained by the process of Self-Organized Criticality (SEO). Over time, non-equilibrium systems with extended degrees of freedom and a high level of nonlinearity become in-creasingly vulnerable to collapse. Social, economic and political systems certainly qualify as complex systems. As my “alma mater” the Santa Fe Institute (SFI) notes, “The archetype of a self-organized critical system is a sand pile. Sand is slowly dropped onto a surface, forming a pile. As the pile grows, avalanches occur which carry sand from the top to the bottom of the pile” (4). That is, the sand pile becomes increasingly unstable over time.

Consider an hourglass or sand clock as an illustration of self-organized criticality. Grains of sand sifting through the narrowest point of the hourglass represent individual events or natural hazards. Over time a sand pile starts to form. How this process unfolds depends on how society chooses to manage risk. A laisser-faire attitude will result in a steeper pile. And grain of sand falling on an in-creasingly steeper pile will eventually trigger an avalanche. Disaster ensues.

Why does the avalanche occur? One might ascribe the cause of the avalanche to that one grain of sand, i.e., a single event. On the other hand, a complex systems approach to resilience would associate the avalanche with the pile’s increasing slope, a historical process which renders the structure increasingly vulnerable to falling grains. From this perspective, “all disasters are slow onset when realisti-cally and locally related to conditions of susceptibility”. A hazard event might be rapid-onset, but the disaster, requiring much more than a hazard, is a long-term process, not a one-off event. The resilience of a given system is therefore not simply dependent on the outcome of future events. Resilience is the complex product of past social, political, economic and even cultural processes.

dealing with avalanches

Scholars like Thomas Homer-Dixon argue that we are becoming increasingly prone to domino effects or cascading changes across systems, thus increasing the likelihood of total synchronous failure. “A long view of human history reveals not regular change but spasmodic, catastrophic disruptions followed by long periods of reinvention and development.” We must therefore “reduce as much as we can the force of the underlying tectonic stresses in order to lower the risk of synchro-nous failure—that is, of catastrophic collapse that cascades across boundaries between technological, social and ecological systems” (5).

Unlike the clock’s lifeless grains of sand, human beings can adapt and maximize their resilience to exogenous shocks through disaster preparedness, mitigation and adaptation—which all require political will. As a colleague of mine recently noted, “I wish it were widely spread amongst society  how important being a grain of sand can be.” Individuals can “flatten” the structure of the sand pile into a less hierarchical but more resilience system, thereby distributing and diffusing the risk and size of an avalanche. Call it distributed adaptation.

operationalizing resilience

As already, the field of ecology defines  resilience as “the capacity of an ecosystem to respond to a perturbation or disturbance by resisting damage and recovering quickly.” Using this understanding of resilience, there are at least 2 ways create more resilient “social ecosystems”:

  1. Resist damage by absorbing and dampening the perturbation.
  2. Recover quickly by bouncing back or rather forward.

Resisting Damage

So how does a society resist damage from a disaster? As hinted earlier, there is no such thing as a “natural” disaster. There are natural hazards and there are social systems. If social systems are not sufficiently resilient to absorb the impact of a natural hazard such as an earthquake, then disaster unfolds. In other words, hazards are exogenous while disasters are the result of endogenous political, economic, social and cultural processes. Indeed, “it is generally accepted among environmental geographers that there is no such thing as a natural disaster. In every phase and aspect of a disaster—causes, vulnerability, preparedness, results and response, and reconstruction—the contours of disaster and the difference between who lives and dies is to a greater or lesser extent a social calculus” (6).

So how do we apply this understanding of disasters and build more resilient communities? Focusing on people-centered early warning systems is one way to do this. In 2006, the UN’s International Strategy for Disaster Reduction (ISDR) recognized that top-down early warning systems for disaster response were increasingly ineffective. They thus called for a more bottom-up approach in the form of people-centered early warning systems. The UN ISDR’s Global Survey of Early Warning Systems (PDF), defines the purpose of people-centered early warning systems as follows:

“… to empower individuals and communities threatened by hazards to act in sufficient time and in an appropriate manner so as to reduce the possibility of personal injury, loss of life, damage to property and the environment, and loss of livelihoods.”

Information plays a central role here. Acting in sufficient time requires having timely information about (1) the hazard/s, (2) our resilience and (3) how to respond. This is where information and communication technologies (ICTs), social media and Big Data play an important role. Take the latter, for example. One reason for the considerable interest in Big Data is prediction and anomaly detection. Weather and climatic sensors provide meteorologists with the copious amounts of data necessary for the timely prediction of weather patterns and  early detection of atmospheric hazards. In other words, Big Data Analytics can be used to anticipate the falling grains of sand.

Now, predictions are often not correct. But the analysis of Big Data can also help us characterize the sand pile itself, i.e., our resilience, along with the associated trends towards self-organized criticality. Recall that complex systems tend towards instability over time (think of the hourglass above). Thanks to ICTs, social media and Big Data, we now have the opportunity to better characterize in real-time the social, economic and political processes driving our sand pile. Now, this doesn’t mean that we have a perfect picture of the road to collapse; simply that our picture is clearer than ever before in human history. In other words, we can better measure our own resilience. Think of it as the Quantified Self move-ment applied to an entirely different scale, that of societies and cities. The point is that Big Data can provide us with more real-time feedback loops than ever before. And as scholars of complex systems know, feedback loops are critical for adaptation and change. Thanks to social media, these loops also include peer-to-peer feedback loops.

An example of monitoring resilience in real-time (and potentially anticipating future changes in resilience) is the UN Global Pulse’s project on food security in Indonesia. They partnered with Crimson Hexagon to forecast food prices in Indonesia by analyzing tweets referring to the price of rice. They found an inter-esting relationship between said tweets and government statistics on food price inflation. Some have described the rise of social media as a new nervous system for the planet, capturing the pulse of our social systems. My colleagues and I at QCRI are therefore in the process of appling this approach to the study of the Arabic Twittersphere. Incidentally, this is yet another critical reason why Open Data is so important (check out the work of OpenDRI, Open Data for Resilience Initiative. See also this post on Demo-cratizing ICT for Development with DIY Innovation and Open Data). More on open data and data philanthropy in the conclusion.

Finally, new technologies can also provide guidance on how to respond. Think of Foursquare but applied to disaster response. Instead of “Break Glass in Case of Emergency,” how about “Check-In in Case of Emergency”? Numerous smart-phone apps such as Waze already provide this kind of at-a-glance, real-time situational awareness. It is only a matter of time until humanitarian organiza-tions develop disaster response apps that will enable disaster-affected commu-nities to check-in for real time guidance on what to do given their current location and level of resilience. Several disaster preparedness apps already exist. Social computing and Big Data Analytics can power these apps in real-time.

Quick Recovery

As already noted, there are at least two ways create more resilient “social eco-systems”. We just discussed the first: resisting damage by absorbing and dam-pening the perturbation.  The second way to grow more resilient societies is by enabling them to rapidly recover following a disaster.

As Manyena writes, “increasing attention is now paid to the capacity of disaster-affected communities to ‘bounce back’ or to recover with little or no external assistance following a disaster.” So what factors accelerate recovery in eco-systems in general? In ecological terms, how quickly the damaged part of an ecosystem can repair itself depends on how many feedback loops it has to the non- (or less-) damaged parts of the ecosystem(s). These feedback loops are what enable adaptation and recovery. In social ecosystems, these feedback loops can be comprised of information in addition to the transfer of tangible resources.  As some scholars have argued, a disaster is first of all “a crisis in communicating within a community—that is, a difficulty for someone to get informed and to inform other people” (7).

Improving ways for local communities to communicate internally and externally is thus an important part of building more resilient societies. Indeed, as Homer-Dixon notes, “the part of the system that has been damaged recovers by drawing resources and information from undamaged parts.” Identifying needs following a disaster and matching them to available resources is an important part of the process. Indeed, accelerating the rate of (1) identification; (2) matching and, (3) allocation, are important ways to speed up overall recovery.

This explains why ICTs, social media and Big Data are central to growing more resilient societies. They can accelerate impact evaluations and needs assessments at the local level. Population displacement following disasters poses a serious public health risk. So rapidly identifying these risks can help affected populations recover more quickly. Take the work carried out by my colleagues at Flowminder, for example. They  empirically demonstrated that mobile phone data (Big Data!) can be used to predict population displacement after major disasters. Take also this study which analyzed call dynamics to demonstrate that telecommunications data could be used to rapidly assess the impact of earthquakes. A related study showed similar results when analyzing SMS’s and building damage Haiti after the 2010 earthquake.

haiti_overview_570

Resilience as Self-Organization and Emergence

Connection technologies such as mobile phones allow individual “grains of sand” in our societal “sand pile” to make necessary connections and decisions to self-organize and rapidly recover from disasters. With appropriate incentives, pre-paredness measures and policies, these local decisions can render a complex system more resilient. At the core here is behavior change and thus the importance of understanding behavior change models. Recall  also Thomas Schelling’s observation that micro-motives can lead to macro-behavior. To be sure, as Thomas Homer-Dixon rightly notes, “Resilience is an emergent property of a system—it’s not a result of any one of the system’s parts but of the synergy between all of its parts.  So as a rough and ready rule, boosting the ability of each part to take care of itself in a crisis boosts overall resilience.” (For complexity science readers, the notions of transforma-tion through phase transitions is relevant to this discussion).

In other words, “Resilience is the capacity of the affected community to self-organize, learn from and vigorously recover from adverse situations stronger than it was before” (8). This link between resilience and capacity for self-organization is very important, which explains why a recent and major evaluation of the 2010 Haiti Earthquake disaster response promotes the “attainment of self-sufficiency, rather than the ongoing dependency on standard humanitarian assistance.” Indeed, “focus groups indicated that solutions to help people help themselves were desired.”

The fact of the matter is that we are not all affected in the same way during a disaster. (Recall the distinction between hazards and disasters discussed earlier). Those of use who are less affected almost always want to help those in need. Herein lies the critical role of peer-to-peer feedback loops. To be sure, the speed at which the damaged part of an ecosystem can repair itself depends on how many feedback loops it has to the non- (or less-) damaged parts of the eco-system(s). These feedback loops are what enable adaptation and recovery.

Lastly, disaster response professionals cannot be every where at the same time. But the crowd is always there. Moreover, the vast majority of survivals following major disasters cannot be attributed to external aid. One study estimates that at most 10% of external aid contributes to saving lives. Why? Because the real first responders are the disaster-affected communities themselves, the local popula-tion. That is, the real first feedback loops are always local. This dynamic of mutual-aid facilitated by social media is certainly not new, however. My colleagues in Russia did this back in 2010 during the major forest fires that ravaged their country.

While I do have a bias towards people-centered interventions, this does not mean that I discount the importance of feedback loops to external actors such as traditional institutions and humanitarian organizations. I also don’t mean to romanticize the notion of “indigenous technical knowledge” or local coping mechanism. Some violate my own definition of human rights, for example. However, my bias stems from the fact that I am particularly interested in disaster resilience within the context of areas of limited statehood where said institutions and organizations are either absent are ineffective. But I certainly recognize the importance of scale jumping, particularly within the context of social capital and social media.

RESILIENCE THROUGH SOCIAL CAPITAL

Information-based feedback loops general social capital, and the latter has been shown to improve disaster resilience and recovery. In his recent book entitled “Building Resilience: Social Capital in Post-Disaster Recovery,” Daniel Aldrich draws on both qualitative and quantitative evidence to demonstrate that “social resources, at least as much as material ones, prove to be the foundation for resilience and recovery.” His case studies suggest that social capital is more important for disaster resilience than physical and financial capital, and more important than conventional explanations. So the question that naturally follows given our interest in resilience & technology is this: can social media (which is not restricted by geography) influence social capital?

Social Capital

Building on Daniel’s research and my own direct experience in digital humani-tarian response, I argue that social media does indeed nurture social capital during disasters. “By providing norms, information, and trust, denser social networks can implement a faster recovery.” Such norms also evolve on Twitter, as does information sharing and trust building. Indeed, “social ties can serve as informal insurance, providing victims with information, financial help and physical assistance.” This informal insurance, “or mutual assistance involves friends and neighbors providing each other with information, tools, living space, and other help.” Again, this bonding is not limited to offline dynamics but occurs also within and across online social networks. Recall the sand pile analogy. Social capital facilitates the transformation of the sand pile away (temporarily) from self-organized criticality. On a related note vis-a-vis open source software, “the least important part of open source software is the code.” Indeed, more important than the code is the fact that open source fosters social ties, networks, communities and thus social capital.

(Incidentally, social capital generated during disasters is social capital that can subsequently be used to facilitate self-organization for non-violent civil resistance and vice versa).

RESILIENCE through big data

My empirical research on tweets posted during disasters clearly shows that while many use twitter (and social media more generally) to post needs during a crisis, those who are less affected in the social ecosystem will often post offers to help. So where does Big Data fit into this particular equation? When disaster strikes, access to information is equally important as access to food and water. This link between information, disaster response and aid was officially recognized by the Secretary General of the International Federation of Red Cross & Red Crescent Societies in the World Disasters Report published in 2005. Since then, disaster-affected populations have become increasingly digital thanks to the very rapid and widespread adoption of mobile technologies. Indeed, as a result of these mobile technologies, affected populations are increasingly able to source, share and generate a vast amount of information, which is completely transforming disaster response.

In other words, disaster-affected communities are increasingly becoming the source of Big (Crisis) Data during and following major disasters. There were over 20 million tweets posted during Hurricane Sandy. And when the major earth-quake and Tsunami hit Japan in early 2011, over 5,000 tweets were being posted every secondThat is 1.5 million tweets every 5 minutes. So how can Big Data Analytics create more resilience in this respect? More specifically, how can Big Data Analytics accelerate disaster recovery? Manually monitoring millions of tweets per minute is hardly feasible. This explains why I often “joke” that we need a local Match.com for rapid disaster recovery. Thanks to social computing, artifi-cial intelligence, machine learning and Big Data Analytics, we can absolutely develop a “Match.com” for rapid recovery. In fact, I’m working on just such a project with my colleagues at QCRI. We are also developing algorithms to auto-matically identify informative and actionable information shared on Twitter, for example. (Incidentally, a by-product of developing a robust Match.com for disaster response could very well be an increase in social capital).

There are several other ways that advanced computing can create disaster resilience using Big Data. One major challenge is digital humanitarian response is the verification of crowdsourced, user-generated content. Indeed, misinforma-tion and rumors can be highly damaging. If access to information is tantamount to food access as noted by the Red Cross, then misinformation is like poisoned food. But Big Data Analytics has already shed some light on how to develop potential solutions. As it turns out, non-credible disaster information shared on Twitter propagates differently than credible information, which means that the credibility of tweets could be predicted automatically.

Conclusion

In sum, “resilience is the critical link between disaster and development; monitoring it [in real-time] will ensure that relief efforts are supporting, and not eroding […] community capabilities” (9). While the focus of this blog post has been on disaster resilience, I believe the insights provided are equally informa-tive for less extreme events.  So I’d like to end on two major points. The first has to do with data philanthropy while the second emphasizes the critical importance of failing gracefully.

Big Data is Closed and Centralized

A considerable amount of “Big Data” is Big Closed and Centralized Data. Flow-minder’s study mentioned above draws on highly proprietary telecommunica-tions data. Facebook data, which has immense potential for humanitarian response, is also closed. The same is true of Twitter data, unless you have millions of dollars to pay for access to the full Firehose, or even Decahose. While access to the Twitter API is free, the number of tweets that can be downloaded and analyzed is limited to several thousand a day. Contrast this with the 5,000 tweets per second posted after the earthquake and Tsunami in Japan. We therefore need some serious political will from the corporate sector to engage in “data philanthropy”. Data philanthropy involves companies sharing proprietary datasets for social good. Call it Corporate Social Responsibility (CRS) for digital humanitarian response. More here on how this would work.

Failing Gracefully

Lastly, on failure. As noted, complex systems tend towards instability, i.e., self-organized criticality, which is why Homer-Dixon introduces the notion of failing gracefully. “Somehow we have to find the middle ground between dangerous rigidity and catastrophic collapse.” He adds that:

“In our organizations, social and political systems, and individual lives, we need to create the possibility for what computer programmers and disaster planners call ‘graceful’ failure. When a system fails gracefully, damage is limited, and options for recovery are preserved. Also, the part of the system that has been damaged recovers by drawing resources and information from undamaged parts.” Homer-Dixon explains that “breakdown is something that human social systems must go through to adapt successfully to changing conditions over the long term. But if we want to have any control over our direction in breakdown’s aftermath, we must keep breakdown constrained. Reducing as much as we can the force of underlying tectonic stresses helps, as does making our societies more resilient. We have to do other things too, and advance planning for breakdown is undoubtedly the most important.”

As Louis Pasteur famously noted, “Chance favors the prepared mind.” Preparing for breakdown is not defeatist or passive. Quite on the contrary, it is wise and pro-active. Our hubris—including our current infatuation with Bid Data—all too often clouds our better judgment. Like Macbeth, rarely do we seriously ask our-selves what we would do “if we should fail.” The answer “then we fail” is an option. But are we truly prepared to live with the devastating consequences of total synchronous failure?

In closing, some lingering (less rhetorical) questions:

  • How can resilience can be measured? Is there a lowest common denominator? What is the “atom” of resilience?
  • What are the triggers of resilience, creative capacity, local improvisation, regenerative capacity? Can these be monitored?
  • Where do the concepts of “lived reality” and “positive deviance” enter the conversation on resilience?
  • Is resiliency a right? Do we bear a responsibility to render systems more resilient? If so, recalling that resilience is the capacity to self-organize, do local communities have the right to self-organize? And how does this differ from democratic ideals and freedoms?
  • Recent research in social-psychology has demonstrated that mindfulness is an amplifier of resilience for individuals? How can be scaled up? Do cultures and religions play a role here?
  • Collective memory influences resilience. How can this be leveraged to catalyze more regenerative social systems?

bio

Epilogue: Some colleagues have rightfully pointed out that resilience is ultima-tely political. I certainly share that view, which is why this point came up in recent conversations with my PopTech colleagues Andrew Zolli & Leetha Filderman. Readers of my post will also have noted my emphasis on distinguishing between hazards and disasters; that the latter are the product of social, economic and political processes. As noted in my blog post, there are no natural disastersTo this end, some academics rightly warn that “Resilience is a very technical, neutral, apolitical term. It was initially designed to characterize systems, and it doesn’t address power, equity or agency…  Also, strengthening resilience is not free—you can have some winners and some losers.”

As it turns out, I have a lot say about the political versus technical argument. First of all, this is hardly a new or original argument but nevertheless an important one. Amartya Senn discussed this issue within the context of famines decades ago, noting that famines do not take place in democracies. In 1997, Alex de Waal published his seminal book, “Famine Crimes: Politics and the Disaster Relief In-dustry in Africa.” As he rightly notes, “Fighting famine is both a technical and political challenge.” Unfortunately, “one universal tendency stands out: technical solutions are promoted at the expense of political ones.” There is also a tendency to overlook the politics of technical actions, muddle or cover political actions with technical ones, or worse, to use technical measures as an excuse not to undertake needed political action.

De Waal argues that the use of the term “governance” was “an attempt to avoid making the political critique too explicit, and to enable a focus on specific technical aspects of government.” In some evaluations of development and humanitarian projects, “a caveat is sometimes inserted stating that politics lies beyond the scope of this study.” To this end, “there is often a weak call for ‘political will’ to bridge the gap between knowledge of technical measures and action to implement them.” As de Waal rightly notes, “the problem is not a ‘missing link’ but rather an entire political tradition, one manifestation of which is contemporary international humanitarianism.” In sum, “technical ‘solutions’ must be seen in the political context, and politics itself in the light of the domi-nance of a technocratic approach to problems such as famine.”

From a paper I presented back in 2007: “the technological approach almost always serves those who seek control from a distance.” As a result of this technological drive for pole position, a related “concern exists due to the separation of risk evaluation and risk reduction between science and political decision” so that which is inherently politically complex becomes depoliticized and mechanized. In Toward a Rational Society (1970), the German philosopher Jürgen Habermas describes “the colonization of the public sphere through the use of instrumental technical rationality. In this sphere, complex social problems are reduced to technical questions, effectively removing the plurality of contending perspectives.”

To be sure, Western science tends to pose the question “How?” as opposed to “Why?”What happens then is that “early warning systems tend to be largely conceived as hazard-focused, linear, topdown, expert driven systems, with little or no engagement of end-users or their representatives.” As De Waal rightly notes, “the technical sophistication of early warning systems is offset by a major flaw: response cannot be enforced by the populace. The early warning information is not normally made public.”  In other words, disaster prevention requires “not merely identifying causes and testing policy instruments but building a [social and] political movement” since “the framework for response is inherently political, and the task of advocacy for such response cannot be separated from the analytical tasks of warning.”

Recall my emphasis on people-centered early warning above and the definition of resilience as capacity for self-organization. Self-organization is political. Hence my efforts to promote greater linkages between the fields of nonviolent action and early warning years ago. I have a paper (dated 2008) specifically on this topic should anyone care to read. Anyone who has read my doctoral dissertation will also know that I have long been interested in the impact of technology on the balance of power in political contexts. A relevant summary is available here. Now, why did I not include all this in the main body of my blog post? Because this updated section already runs over 1,000 words.

In closing, I disagree with the over-used criticism that resilience is reactive and about returning to initial conditions. Why would we want to be reactive or return to initial conditions if the latter state contributed to the subsequent disaster we are recovering from? When my colleague Andrew Zolli talks about resilience, he talks about “bouncing forward”, not bouncing back. This is also true of Nassim Taleb’s term antifragility, the ability to thrive on disruption. As Homer-Dixon also notes, preparing to fail gracefully is hardly reactive either.

Tweeting is Believing? Analyzing Perceptions of Credibility on Twitter

What factors influence whether or not a tweet is perceived as credible? According to this recent study, users have “difficulty discerning truthfulness based on con-tent alone, with message topic, user name, and user image all impacting judg-ments of tweets and authors to varying degrees regardless of the actual truth-fulness of the item.”

For example, “Features associated with low credibility perceptions were the use of non-standard grammar and punctuation, not replacing the default account image, or using a cartoon or avatar as an account image. Following a large number of users was also associated with lower author credibility, especially when unbalanced in comparison to follower count […].” As for features enhan-cing a tweet’s credibility, these included “author influence (as measured by follower, retweet, and  mention counts), topical expertise (as established through a Twitter homepage bio, history of on-topic tweeting, pages outside of Twitter, or having a location relevant to the topic of the tweet), and reputation (whether an author is someone a user follows, has heard of, or who has an official Twitter account verification seal). Content related features viewed as credibility-enhancing were containing a URL leading to a high-quality site, and the existence of other tweets conveying similar information.”

 In general, users’ ability to “judge credibility in practice is largely limited to those features visible at-a-glance in current UIs (user picture, user name, and tweet content). Conversely, features that often are obscured in the user interface, such as the bio of a user, receive little attention despite their ability to impact cred-ibility judgments.” The table below compares a features’s perceived credibility impact with the attention actually allotted to assessing that feature.

“Message topic influenced perceptions of tweet credibility, with science tweets receiving a higher mean tweet credibility rating than those about either politics  or entertainment. Message topic had no statistically significant impact on perceptions of author credibility.” In terms of usernames, “Authors with topical names were considered more credible than those with traditional user names, who were in turn considered more credible than those with internet name styles.” In a follow up experiment, the study analyzed perceptions of credibility vis-a-vis a user’s image, i.e., the profile picture associated with a given Twitter account. “Use of the default Twitter icon significantly lowers ratings of content and marginally lowers ratings of authors […]” in comparison to generic, topical, female and male images.

Obviously, “many of these metrics can be faked to varying extents. Selecting a topical username is trivial for a spam account. Manufacturing a high follower to following ratio or a high number of retweets is more difficult but not impossible. User interface changes that highlight harder to fake factors, such as showing any available relationship between a user’s network and the content in question, should help.” Overall, these results “indicate a discrepancy between features people rate as relevant to determining credibility and those that mainstream social search engines make available.” The authors of the study conclude by suggesting changes in interface design that will enhance a user’s ability to make credibility judgements.

“Firstly, author credentials should be accessible at a glance, since these add value and users rarely take the time to click through to them. Ideally this will include metrics that convey consistency (number of tweets on topic) and legitimization by other users (number of mentions or retweets), as well as details from the author’s Twitter page (bio, location, follower/following counts). Second, for con-tent assessment, metrics on number of retweets or number of times a link has been shared, along with who is retweeting and sharing, will provide consumers with context for assessing credibility. […] seeing clusters of tweets that conveyed similar messages was reassuring to users; displaying such similar clusters runs counter to the current tendency for search engines to strive for high recall by showing a diverse array of retrieved items rather than many similar ones–exploring how to resolve this tension is an interesting area for future work.”

In sum, the above findings and recommendations explain why platforms such as RapportiveSeriously Rapid Source Review (SRSR) and CrisisTracker add so much value to the process of assessing the credibility of tweets in near real-time. For related research: Predicting the Credibility of Disaster Tweets Automatically and: Automatically Ranking the Credibility of Tweets During Major Events.