Category Archives: Big Data

Analyzing Tweets Posted During Mumbai Terrorist Attacks

Over 1 million unique users posted more than 2.7 million tweets in just 3 days following the triple bomb blasts that struck Mumbai on July 13, 2011. Out of these, over 68,000 tweets were “original tweets” (in contrast to retweets) and related to the bombings. An analysis of these tweets yielded some interesting patterns. (Note that the Ushahidi Map of the bombings captured ~150 reports; more here).

One unique aspect of this study (PDF) is the methodology used to assess the quality of the Twitter dataset. The number of tweets per user was graphed in order to test for a power law distribution. The graph below shows the log distri-bution of the number of tweets per user. The straight lines suggests power law behavior. This finding is in line with previous research done on Twitter. So the authors conclude that the quality of the dataset is comparable to the quality of Twitter datasets used in other peer-reviewed studies.

I find this approach intriguing because Professor Michael Spagat, Dr. Ryan Woodard and I carried out related research on conflict data back in 2006. One fascinating research question that emerges from all this, and which could be applied to twitter datasets, is whether the slope of the power law says anything about the type of conflict/disaster being tweeted about, the expected number of casualties or even the propagation of rumors.  If you’re interested in pursuing this research question (and have worked with power laws before), please do get in touch. In the meantime, I challenge the authors’ suggestion that a power law distribution necessarily says anything about the quality or reliability of the underlying data. Using the casualty data from SyriaTracker (which is also used by USAID in their official crisis maps), my colleague Dr. Ryan Woodard showed that this dataset does not follow a power law distribution—even thought it is one of the most reliable on Syria.

Syria_PL

Moving on to the content analysis of the Mumbai blast tweets:  “The number of URLs and @-mentions in tweets increase during the time of the crisis in com-parison to what researchers have exhibited for normal circumstances.” The table below lists the top 10 URLs shared on Twitter. Inter-estingly, the link to a Google Spreadsheet was amongst the most shared resource. Created by Twitter user Nitin Sagar, the spreadsheet was used to “coordinate relief operation among people. Within hours hundreds of people registered on the sheet via Twitter. People asked for or off ered help on that spreadsheet for many hours.”

The analysis also reveals that “the number of tweets or updates by authority users (those with large number of followers) are very less, i.e., majority of content generated on Twitter during the crisis comes from non authority users.”  In addition, tweets generated by authority users have a high level of retweets. The results also indicate that “the number of tweets generated by people with large follower base (who are generally like government owned accounts, cele-brities, media companies) were very few. Thus, the majority of content generated at the time of crisis was from unknown users. It was also observed that, though the number of posts were less by users with large number of followers, these posts registered high numbers of retweets.”

Rumors related to the blasts also spread through Twitter. For example, rumors began to circulate about a fourth bomb going off. “Some tweets even speci fied locations of 4th blast as Lemington street, Colaba and Charni. Around 500+ tweets and retweets were posted about this.” False rumors about hospital blood banks needing donations were also propagated via Twitter. “They were initiated by a user, @KapoorChetan and around 2,000 tweets and retweets were made regarding this by Twitter users.” The authors of the study believe that such false rumors and can be prevented if credible sources like the mainstream media companies and the government post updates on social media more frequently.

I did a bit of research on this and found that NDTV did use their twitter feed (which has over half-a-million followers) to counter these rumors. For example, “RT @ndtv: Mumbai police: Don’t believe rumours of more bombs. False rumours being spread deliberately.” Journalist Sonal Kalra also acted to counter rumors: “RT @sonalkalra: BBMs about bombs found in Delhi are FALSE. Pls pls don’t spread rumours. #mumbaiblasts.”

In conclusion, the study considers the “privacy threats during the Twitter activity after the blasts. People openly tweeted their phone numbers on social media websites like Twitter, since at such moment of crisis people wished to reach out to help others. But, long after the crisis was over, such posts still remained publicly available on the Internet.” In addition, “people also openly posted their blood group, home address, etc. on Twitter to off er help to victims of the blasts.” The Ushahidi Map also includes personal information. These data privacy and security issues continue to pose major challenges vis-a-vis the use of social media for crisis response.

Bio

See also: Did Terrorists Use Twitter to Increase Situational Awareness? [Link]

Keynote: Next Generation Humanitarian Technology

I’m excited to be giving the Keynote address at the Social Media and Response Management Interface Event (SMARMIE 2013) in New York this morning. A big thank you to the principal driver behind this important event, Chuck Frank, for kindly inviting me to speak. This is my first major keynote since joining QCRI, so I’m thrilled to share what I’ve learned during this time and my vision for the future of humanitarian technology. But I’m even more excited by the selection of speakers and caliber of participants. I’m eager to learn about their latest projects, gain new insights and hopefully create pro-active partnerships moving forward.

You can follow this event via live stream and @smarmieNYC & #smarmie). I  plan to live tweeting the event at @patrickmeier. My slides are available for download here (125MB). Each slide include speaking notes, which may be of interest to folks who are unable to follow via live stream. Feel free to use my slides but strictly for non-commercial purposes and only with direct attribution. I’ll be sure to post the video of my talk on iRevolution when it becomes available. In the meantime, these videos and publications may be of interest. Also, I’ve curated the table of contents below with 60+ links to every project and/or concept referred to in my keynote and slides (in chronological order) so participants and others can revisit these after the conference—and more importantly keep our conver-sations going via Twitter and the comments section of the blog posts. I plan to hire a Research Assistant in the near future to turn these (and other posts) into a series of up-to-date e-books in which I’ll cite and fully credit the most interesting and insightful comments posted on iRevolution.

Social Media Pulse of Planet

http://iRevolution.net/2013/02/02/pulse-of-the-planet
http://iRevolution.net/2013/02/06/the-world-at-night
http://iRevolution.net/2011/04/20/network-witness

Big Crisis Data and Added Value

http://iRevolution.net/2011/06/22/no-data-bad-data

http://iRevolution.net/2012/02/26/mobile-technologies-crisis-mapping-disaster-response

http://iRevolution.net/2012/12/17/debating-tweets-disaster

http://iRevolution.net/2012/07/18/disaster-tweets-for-situational-awareness

http://iRevolution.net/2013/01/11/disaster-resilience-2-0

Standby Task Force (SBTF)

http://blog.standbytaskforce.com

http://iRevolution.net/2010/09/26/crisis-mappers-task-force

Libya Crisis Map

http://blog.standbytaskforce.com/libya-crisis-map-report

http://irevolution.net/2011/03/04/crisis-mapping-libya

http://iRevolution.net/2011/03/08/volunteers-behind-libya-crisis-map

http://iRevolution.net/2011/06/12/im-not-gaddafi-test

Philippines Crisis Map

http://iRevolution.net/2012/12/05/digital-response-to-typhoon-philippines

http://iRevolution.net/2012/12/08/digital-response-typhoon-pablo

http://iRevolution.net/2012/12/06/digital-disaster-response-typhoon

http://iRevolution.net/2012/06/03/geofeedia-for-crisis-mapping

http://iRevolution.net/2013/02/26/crowdflower-for-disaster-response

Digital Humanitarians 

http://www.digitalhumanitarians.com

Human Computation

http://iRevolution.net/2013/01/20/digital-humanitarian-micro-tasking

Human Computation for Disaster Response (submitted for publication)

Syria Crisis Map

http://iRevolution.net/2012/03/25/crisis-mapping-syria

http://iRevolution.net/2012/11/27/usaid-crisis-map-syria

http://iRevolution.net/2012/07/30/collaborative-social-media-analysis

http://iRevolution.net/2012/05/29/state-of-the-art-digital-disease-detection

Hybrid Systems for Disaster Response

http://iRevolution.net/2012/10/21/crowdsourcing-and-advanced-computing

http://iRevolution.net/2012/07/30/twitter-for-humanitarian-cluster

http://iRevolution.net/2013/02/11/update-twitter-dashboard

Credibility of Social Media: Compare to What?

http://iRevolution.net/2013/01/08/disaster-tweets-versus-911-calls

http://iRevolution.net/2010/09/22/911-system

Human Computed Crediblity 

http://iRevolution.net/2012/07/26/truth-and-social-media

http://iRevolution.net/2011/11/29/information-forensics-five-case-studies

http://iRevolution.net/2010/06/30/crowdsourcing-detective

http://iRevolution.net/2012/11/20/verifying-source-credibility

http://iRevolution.net/2012/09/16/accelerating-verification

http://iRevolution.net/2010/09/19/veracity-of-tweets-during-a-major-crisis

http://iRevolution.net/2011/03/26/technology-to-counter-rumors

http://iRevolution.net/2012/03/10/truthiness-as-probability

http://iRevolution.net/2013/01/27/mythbuster-tweets

http://iRevolution.net/2012/10/31/hurricane-sandy

http://iRevolution.net/2012/07/16/crowdsourcing-for-human-rights-monitoring-challenges-and-opportunities-for-information-collection-verification

Verily: Crowdsourced Verification

http://iRevolution.net/2013/02/19/verily-crowdsourcing-evidence

http://iRevolution.net/2011/11/06/time-critical-crowdsourcing

http://iRevolution.net/2012/09/18/six-degrees-verification

http://iRevolution.net/2011/09/26/augmented-reality-crisis-mapping

AI Computed Credibility

http://iRevolution.net/2012/12/03/predicting-credibility

http://iRevolution.net/2012/12/10/ranking-credibility-of-tweets

Future of Humanitarian Tech

http://iRevolution.net/2012/04/17/red-cross-digital-ops

http://iRevolution.net/2012/11/15/live-global-twitter-map

http://iRevolution.net/2013/02/16/crisis-mapping-minority-report

http://iRevolution.net/2012/04/09/humanitarian-future

http://iRevolution.net/2011/08/22/khan-borneo-galaxies

http://iRevolution.net/2010/03/24/games-to-turksource

http://iRevolution.net/2010/07/08/cognitive-surplus

http://iRevolution.net/2010/08/14/crowd-is-always-there

http://iRevolution.net/2011/09/14/crowdsource-crisis-response

http://iRevolution.net/2012/07/04/match-com-for-economic-resilience

http://iRevolution.net/2013/02/27/matchapp-disaster-response-app

http://iRevolution.net/2013/01/07/what-waze-can-teach-us

Policy

http://iRevolution.net/2012/12/04/catch-22

http://iRevolution.net/2012/02/05/iom-data-protection

http://iRevolution.net/2013/01/23/perils-of-crisis-mapping

http://iRevolution.net/2013/02/25/launching-sms-code-of-conduct

http://iRevolution.net/2013/02/26/haiti-lies

http://iRevolution.net/2012/06/04/big-data-philanthropy-for-humanitarian-response

http://iRevolution.net/2012/07/25/become-a-data-donor

Bio

ps. Please let me know if you find any broken links so I can fix them, thank you!

Social Media as Passive Polling: Prospects for Development & Disaster Response

My Harvard/MIT colleague Todd Mostak wrote his award-winning Master’s Thesis on “Social Media as Passive Polling: Using Twitter and Online Forums to Map Islamism in Egypt.” For this research, Todd evaluated the “potential of Twitter as a source of time-stamped, geocoded public opinion data in the context of the recent popular uprisings in the Middle East.” More specifically, “he explored three ways of measuring a Twitter user’s degree of political Islamism.” Why? Because he wanted to test the long-standing debate on whether Islamism is associated with poverty.

Screen Shot 2013-02-18 at 11.17.09 AM

So Todd collected millions of geo-tagged tweets from Egypt over a six month period, which he then aggregated by census district in order to regress proxies for poverty against measures of Islamism drived from the tweets and the users’ social graphs. His findings reveal that “Islamist sentiment seems to be positively correlated with male unemployment, illiteracy, and percentage of land used in agriculture and negatively correlated with percentage of men in their youth aged 15-25. Note that female variables for unemployment and age were statistically insignificant.” As with all research, there are caveats such as the weighting scale used for the variables and questions over the reliability of census variables.

Screen Shot 2013-02-18 at 11.15.59 AM

To carry out his graduate research, Todd built a web-enabled database (MapD) powered by a Graphics Processing Units (GPU) to perform real-time querying and visualization of big datasets. He is now working with Harvard’s Center for Geographic Analysis (CGA) to put make this available via a public web interface called Tweetmap. This Big Data streaming and exploration tool presen-tly displays 119 million tweets from 12/10/2012 to 12/31/2012. He is adding 6-7 million new georeferenced tweets per day (but these are not yet publicly available on Tweetmap). According to Todd, the time delay from live tweet to display on the map is about 1 second. Thanks to this GPU-powered approach, he expects that billions of tweets could be displayed in real-time.

Screen Shot 2013-02-18 at 11.14.02 AM

As always with impressive projects, no one single person was behind the entire effort. Ben Lewis, who heads the WorldMap initiative at CGA deserves a lot of credit for making Tweetmap a reality. Indeed, Todd collaborated directly with CGA’s Ben Lewis throughout this project and benefited extensively from his expertise. Matt Bertrand (lead developer for CGA) did the WorldMap-side integration of MapD to create the TweetMap interface.

Todd and I recently spoke about integrating his outstanding work on automated live mapping to QCRI’s Twitter Dashboard for Disaster Response. Exciting times. In the meantime, Todd has kindly shared his dataset of 700+ million geotagged tweets for my team and I to analyze. The reason I’m excited about this approach is best explained with this heatmap of the recent snow-storm in the northeastern US. Todd is already using Tweetmap for live crisis mapping. While this system filters by keyword, our Dashboard will use machine learning to provide more specific streams of relevant tweets, some of which could be automatically mapped on Tweetmap. See Todd’s Flickr page for more Tweetmap visuals.

Screen Shot 2013-02-18 at 11.30.54 AM

I’m also excited by Todd’s GPU-powered approach for a project I’m exploring with UN and World Bank colleagues. The purpose of that research project is to determine whether socio-economic trends such as poverty and unemployment can be captured via Twitter. Our first case study is Egypt. Depending on the results, we may be able to take it one step further by applying sentiment analysis to real-time, georeferenced tweets to visualize Twitter users’ per-ception vis-a-vis government services—a point of interest for my UN colleagues in Cairo.

bio

Verily: Crowdsourced Verification for Disaster Response

Social media is increasingly used for communicating during crises. This rise in Big (Crisis) Data means that finding the proverbial needle in the growing haystack of information is becoming a major challenge. Social media use during Hurricane Sandy produced a “haystack” of half-a-million Instagram photos and 20 million tweets. But which of these were actually relevant for disaster response and could they have been detected in near real-time? The purpose of QCRI’s experimental Twitter Dashboard for Disaster Response project is to answer this question. But what about the credibility of the needles in the info-stack?

10-Red-Balloons

To answer this question, our Crisis Computing Team at QCRI has partnered with the Social Computing & Artificial Intelligence Lab at the Masdar Institute of Science and Technology. This applied research project began with a series of conversations in mid-2012 about DARPA’s Red Balloon Challenge. This challenge posted in 2009 offered $40K to the individual or team that could find the correct location of 10 red weather balloons discretely placed across the continental United States, an area covering well over 3 million square miles (8 million square kilometers). My friend Riley Crane at MIT spearheaded the team that won the challenge in 8 hours and 52 minutes by using social media.

Riley and I connected right after the Haiti Earthquake to start exploring how we might apply his team’s winning strategy to disaster response. But we were pulled in different directions due to PhD & post-doc obligations and start-up’s. Thank-fully, however, Riley’s colleague Iyad Rahwan got in touch with me to continue these conversations when I joined QCRI. Iyad is now at the Masdar Institute. We’re collaborating with him and his students to apply collective intelligence insights from the balloon to address the problem of false or misleading content shared on social media during  disasters.

Screen Shot 2013-02-16 at 2.26.41 AM

If 10 balloons planted across 3 million square miles can be found in under 9 hours, then surely the answer to the question “Did Hurricane Sandy really flood this McDonald’s in Virginia?” can be found in under 9 minutes given that  Virginia is 98% smaller than the “haystack” of the continental US. Moreover, the location of the restaurant would already be known or easily findable. The picture below, which made the rounds on social media during the hurricane is in reality part of an art exhibition produced in 2009. One remarkable aspect of the social media response to Hurricane Sandy was how quickly false information got debunked and exposed as false—not only by one good (digital) Samaritan, but by several.

SandyFake

Having access to accurate information during a crisis leads to more targeted self-organized efforts at the grassroots level. Accurate information is also important for emergency response professionals. The verification efforts during Sandy were invaluable but disjointed and confined to the efforts of a select few individuals. What if thousands could be connected and mobilized to cross-reference and verify suspicious content shared on social media during a disaster?

Say an earthquake struck Santiago, Chile a few minutes ago and contradictory reports begin to circulate on social media that the bridge below may have been destroyed. Determining whether transportation infrastructure is still useable has important consequences for managing the logistics of a disaster response opera-tion. So what if instead of crowdsourcing the correct location of  balloons across an entire country, one could crowdsource the collection of evidence in just one city struck by a disaster to determine whether said bridge had actually been destroyed in a matter of minutes?

santiagobridge

To answer these questions, QCRI and Masdar have launched an experimental  platform called Verily. We are applying best practices in time-critical crowd-sourcing coupled with gamification and reputation mechanisms to leverage the good will of (hopefully) thousands of digital Samaritans during disasters. This is experimental research, which means it may very well not succeed as envisioned. But that is a luxury we have at QCRI—to innovate next-generation humanitarian technologies via targeted iteration and experimentation. For more on this project, our concept paper is available as a Google Doc here. We invite feedback and welcome collaborators.

In the meantime, we are exploring the possibility of integrating the InformCam mobile application as part of Verily. InformaCam adds important metadata to images and videos taken by eyewitnesses. “The metadata includes information like the user’s current GPS coordinates, altitude, compass bearing, light meter readings, the signatures of neighboring devices, cell towers, and wifi net-works; and serves to shed light on the exact circumstances and contexts under which the digital image was taken.” We are also talking to our partners at MIT’s Computer Science & Artificial Intelligence Lab in Boston about other mobile solutions that may facilitate the use of Verily.

Again, this is purely experimental and applied research at this point. We hope to have an update on our progress in the coming months.

Bio

See also:

  •  Crowdsourcing Critical Thinking to Verify Social Media During Crises [Link]
  •  Using Crowdsourcing to Counter Rumors on Social Media [Link]

Update: Twitter Dashboard for Disaster Response

Project name: Artificial Intelligence for Disaster Response (AIDR). For a more recent update, please click here.

My Crisis Computing Team and I at QCRI have been working hard on the Twitter Dashboard for Disaster Response. We first announced the project on iRevolution last year. The experimental research we’ve carried out since has been particularly insightful vis-a-vis the opportunities and challenges of building such a Dashboard. We’re now using the findings from our empirical research to inform the next phase of the project—namely building the prototype for our humanitarian colleagues to experiment with so we can iterate and improve the platform as we move forward.

KnightDash

Manually processing disaster tweets is becoming increasingly difficult and unrealistic. Over 20 million tweets were posted during Hurricane Sandy, for example. This is the main problem that our Twitter Dashboard aims to solve. There are two ways to manage this challenge of Big (Crisis) Data: Advanced Computing and Human Computation. The former entails the use of machine learning algorithms to automatically tag tweets while the latter involves the use of microtasking, which I often refer to as Smart Crowdsourcing. Our Twitter Dashboard seeks to combine the best of both methodologies.

On the Advanced Computing side, we’ve developed a number of classifiers that automatically identify tweets that:

  • Contain informative content (in contrast to personal messages or information unhelpful for disaster response);
  • Are posted by eye-witnesses (as opposed to 2nd-hand reporting);
  • Include pictures, video footage, mentions from TV/radio
  • Report casualties and infrastructure damage;
  • Relate to people missing, seen and/or found;
  • Communicate caution and advice;
  • Call for help and important needs;
  • Offer help and support.

These classifiers are developed using state-of-the-art machine learning tech-niques. This simply means that we take a Twitter dataset of a disaster, say Hurricane Sandy, and develop clear definitions for “Informative Content,” “Eye-witness accounts,” etc. We use this classification system to tag a random sample of tweets from the dataset (usually 100+ tweets). We then “teach” algorithms to find these different topics in the rest of the dataset. We tweak said algorithms to make them as accurate as possible; much like training a dog new tricks like go-fetch (wink).

fetchball

We’ve found from this research that the classifiers are quite accurate but sensitive to the type of disaster being analyzed and also the country in which said disaster occurs. For example, a set of classifiers developed from tweets posted during Hurricane Sandy tend to be less accurate when applied to tweets posted for New Zealand’s earthquake. Each classifier is developed based on tweets posted during a specific disaster. In other words, while the classifiers can be highly accurate (i.e., tweets are correctly tagged as being damage-related, for example), they only tend to be accurate for the type of disaster they’ve been trained for, e.g., weather-related disasters (tornadoes), earth-related (earth-quakes) and water-related (floods).

So we’ve been busy trying to collect as many Twitter datasets of different disasters as possible, which has been particularly challenging and seriously time-consuming given Twitter’s highly restrictive Terms of Service, which prevents the direct sharing of Twitter datasets—even for humanitarian purposes. This means we’ve had to spend a considerable amount of time re-creating Twitter datasets for past disasters; datasets that other research groups and academics have already crawled and collected. Thank you, Twitter. Clearly, we can’t collect every single tweet for every disaster that has occurred over the past five years or we’ll never get to actually developing the Dashboard.

That said, some of the most interesting Twitter disaster datasets are of recent (and indeed future) disasters. Truth be told, tweets were still largely US-centric before 2010. But the international coverage has since increased, along with the number of new Twitter users, which almost doubled in 2012 alone (more neat stats here). This in part explains why more and more Twitter users actively tweet during disasters. There is also a demonstration effect. That is, the international media coverage of social media use during Hurricane Sandy, for example, is likely to prompt citizens in other countries to replicate this kind of pro-active social media use when disaster knocks on their doors.

So where does this leave us vis-a-vis the Twitter Dashboard for Disaster Response? Simply that a hybrid approach is necessary (see TEDx talk above). That is, the Dashboard we’re developing will have a number of pre-developed classifiers based on as many datasets as we can get our hands on (categorized by disaster type). In addition to that, the dashboard will also allow users to create their own classifiers on the fly by leveraging human computation. They’ll also be able to microtask the creation of new classifiers.

In other words, what they’ll do is this:

  • Enter a search query on the dashboard, e.g., #Sandy.
  • Click on “Create Classifier” for #Sandy.
  • Create a label for the new classifier, e.g., “Animal Rescue”.
  • Tag 50+ #Sandy tweets that convey content about animal rescue.
  • Click “Run Animal Rescue Classifier” on new incoming tweets.

The new classifier will then automatically tag incoming tweets. Of course, the classifier won’t get it completely right. But the beauty here is that the user can “teach” the classifier not to make the same mistakes, which means the classifier continues to learn and improve over time. On the geo-location side of things, it is indeed true that only ~3% of all tweets are geotagged by users. But this figure can be boosted to 30% using full-text geo-coding (as was done the TwitterBeat project). Some believe this figure can be doubled (towards 75%) by applying Google Translate to the full-text geo-coding. The remaining users can be queried via Twitter for their location and that of the events they are reporting.

So that’s where we’re at with the project. Ultimately, we envision these classifiers to be like individual apps that can be used/created, dragged and dropped on an intuitive widget-like dashboard with various data visualization options. As noted in my previous post, everything we’re building will be freely accessible and open source. And of course we hope to include classifiers for other languages beyond English, such as Arabic, Spanish and French. Again, however, this is purely experimental research for the time being; we want to be crystal clear about this in order to manage expectations. There is still much work to be done.

In the meantime, please feel free to get in touch if you have disaster datasets you can contribute to these efforts (we promise not to tell Twitter). If you’ve developed classifiers that you think could be used for disaster response and you’re willing to share them, please also get in touch. If you’d like to join this project and have the required skill sets, then get in touch, we may be able to hire you! Finally, if you’re an interested end-user or want to share some thoughts and suggestions as we embark on this next phase of the project, please do also get in touch. Thank you!

bio

Big Data for Development: From Information to Knowledge Societies?

Unlike analog information, “digital information inherently leaves a trace that can be analyzed (in real-time or later on).” But the “crux of the ‘Big Data’ paradigm is actually not the increasingly large amount of data itself, but its analysis for intelligent decision-making (in this sense, the term ‘Big Data Analysis’ would actually be more fitting than the term ‘Big Data’ by itself).” Martin Hilbert describes this as the “natural next step in the evolution from the ‘Information Age’ & ‘Information Societies’ to ‘Knowledge Societies’ […].”

Hilbert has just published this study on the prospects of Big Data for inter-national development. “From a macro-perspective, it is expected that Big Data informed decision-making will have a similar positive effect on efficiency and productivity as ICT have had during the recent decade.” Hilbert references a 2011 study that concluded the following: “firms that adopted Big Data Analysis have output and productivity that is 5–6 % higher than what would be expected given their other investments and information technology usage.” Can these efficiency gains be brought to the unruly world of international development?

To answer this question, Hilbert introduces the above conceptual framework to “systematically review literature and empirical evidence related to the pre-requisites, opportunities and threats of Big Data Analysis for international development.” Words, Locations, Nature and Behavior are types of data that are becoming increasingly available in large volumes.

“Analyzing comments, searches or online posts [i.e., Words] can produce nearly the same results for statistical inference as household surveys and polls.” For example, “the simple number of Google searches for the word ‘unemployment’ in the U.S. correlates very closely with actual unemployment data from the Bureau of Labor Statistics.” Hilbert argues that the tremendous volume of free textual data makes “the work and time-intensive need for statistical sampling seem almost obsolete.” But while the “large amount of data makes the sampling error irrelevant, this does not automatically make the sample representative.” 

The increasing availability of Location data (via GPS-enabled mobile phones or RFIDs) needs no further explanation. Nature refers to data on natural processes such as temperature and rainfall. Behavior denotes activities that can be captured through digital means, such as user-behavior in multiplayer online games or economic affairs, for example. But “studying digital traces might not automatically give us insights into offline dynamics. Besides these biases in the source, the data-cleaning process of unstructured Big Data frequently introduces additional subjectivity.”

The availability and analysis of Big Data is obviously limited in areas with scant access to tangible hardware infrastructure. This corresponds to the “Infra-structure” variable in Hilbert’s framework. “Generic Services” refers to the production, adoption and adaptation of software products, since these are a “key ingredient for a thriving Big Data environment.” In addition, the exploitation of Big Data also requires “data-savvy managers and analysts and deep analytical talent, as well as capabilities in machine learning and computer science.” This corresponds to “Capacities and Knowledge Skills” in the framework.

The third and final side of the framework represents the types of policies that are necessary to actualize the potential of Big Data for international develop-ment. These policies are divided into those that elicit a Positive Feedback Loops such as financial incentives and those that create regulations such as interoperability, that is, Negative Feedback Loops.

The added value of Big Data Analytics is also dependent on the availability of publicly accessible data, i.e., Open Data. Hilbert estimates that a quarter of US government data could be used for Big Data Analysis if it were made available to the public. There is a clear return on investment in opening up this data. On average, governments with “more than 500 publicly available databases on their open data online portals have 2.5 times the per capita income, and 1.5 times more perceived transparency than their counterparts with less than 500 public databases.” The direction of “causality” here is questionable, however.

Hilbert concludes with a warning. The Big Data paradigm “inevitably creates a new dimension of the digital divide: a divide in the capacity to place the analytic treatment of data at the forefront of informed decision-making. This divide does not only refer to the availability of information, but to intelligent decision-making and therefore to a divide in (data-based) knowledge.” While the advent of Big Data Analysis is certainly not a panacea,”in a world where we desperately need further insights into development dynamics, Big Data Analysis can be an important tool to contribute to our understanding of and improve our contributions to manifold development challenges.”

I am troubled by the study’s assumption that we live in a Newtonian world of decision-making in which for every action there is an automatic equal and opposite reaction. The fact of the matter is that the vast majority of development policies and decisions are not based on empirical evidence. Indeed, rigorous evidence-based policy-making and interventions are still very much the exception rather than the rule in international development. Why? “Account-ability is often the unhappy byproduct rather than desirable outcome of innovative analytics. Greater accountability makes people nervous” (Harvard 2013). Moreover, response is always political. But Big Data Analysis runs the risk de-politicize a problem. As Alex de Waal noted over 15 years ago, “one universal tendency stands out: technical solutions are promoted at the expense of political ones.” I hinted at this concern when I first blogged about the UN Global Pulse back in 2009.

In sum, James Scott (one of my heroes) puts it best in his latest book:

“Applying scientific laws and quantitative measurement to most social problems would, modernists believed, eliminate the sterile debates once the ‘facts’ were known. […] There are, on this account, facts (usually numerical) that require no interpretation. Reliance on such facts should reduce the destructive play of narratives, sentiment, prejudices, habits, hyperbole and emotion generally in public life. […] Both the passions and the interests would be replaced by neutral, technical judgment. […] This aspiration was seen as a new ‘civilizing project.’ The reformist, cerebral Progressives in early twentieth-century American and, oddly enough, Lenin as well believed that objective scientific knowledge would allow the ‘administration of things’ to largely replace politics. Their gospel of efficiency, technical training and engineering solutions implied a world directed by a trained, rational, and professional managerial elite. […].”

“Beneath this appearance, of course, cost-benefit analysis is deeply political. Its politics are buried deep in the techniques […] how to measure it, in what scale to use, […] in how observations are translated into numerical values, and in how these numerical values are used in decision making. While fending off charges of bias or favoritism, such techniques […] succeed brilliantly in entrenching a political agenda at the level of procedures and conventions of calculation that is doubly opaque and inaccessible. […] Charged with bias, the official can claim, with some truth, that ‘I am just cranking the handle” of a nonpolitical decision-making machine.”

See also:

  • Big Data for Development: Challenges and Opportunities [Link]
  • Beware the Big Errors of Big Data (by Nassim Taleb) [Link]
  • How to Build Resilience Through Big Data [Link]

Social Media: Pulse of the Planet?

In 2010, Hillary Clinton described social media as a new nervous system for our planet (1). So can the pulse of the planet be captured with social media? There are many who are skeptical not least because of the digital divide. “You mean the pulse of the Data Have’s? The pulse of the affluent?” These rhetorical questions are perfectly justified, which is why social media alone should not be the sole source of information that feeds into decision-making for policy purposes. But millions are joining the social media ecosystem everyday. So the selection bias is not increasing but decreasing. We may not be able to capture the pulse of the planet comprehensively and at a very high resolution yet, but the pulse of the majority world is certainly growing louder by the day.

mapnight2

This map of the world at night (based on 2011 data) reveals areas powered by electricity. Yes, Africa has far less electricity consumption. This is not misleading, it is an accurate proxy for industrial development (amongst other indexes). Does this data suffer from selection bias? Yes, the data is biased towards larger cities rather than the long tail. Does this render the data and map useless? Hardly. It all depends on what the question is.

Screen Shot 2013-02-02 at 8.22.49 AM

What if our world was lit up by information instead of lightbulbs? The map above from TweetPing does just that. The website displays tweets in real-time as they’re posted across the world. Strictly speaking, the platform displays 10% of the ~340 million tweets posted each day (i.e., the “Decahose” rather than the “Firehose”). But the volume and velocity of the pulsing ten percent is already breathtaking.

Screen Shot 2013-01-28 at 7.01.36 AM

One may think this picture depicts electricity use in Europe. Instead, this is a map of geo-located tweets (blue dots) and Flickr pictures (red dots). “White dots are locations that have been posted to both” (2). The number of active Twitter users grew an astounding 40% in 2012, making Twitter the fastest growing social network on the planet. Over 20% of the world’s internet population is now on Twitter (3). The Sightsmap below is a heat map based on the number of photographs submitted to Panoramio at different locations.

Screen Shot 2013-02-05 at 7.59.37 AM

The map below depicts friendship ties on Facebook. This was generated using data when there were “only” 500 million users compared to today’s 1 billion+.

FBmap

The following map does not depict electricity use in the US or the distribution of the population based on the most recent census data. Instead, this is a map of check-in’s on Foursquare. What makes this map so powerful is not only that it was generated using 500 million check-in’s but that “all those check-ins you see aren’t just single points—they’re links between all the other places people have been.”

FoursquareMap

TwitterBeat takes the (emotional) pulse of the planet by visualizing the Twitter Decahose in real-time using sentiment analysis. The crisis map in the YouTube video below comprises all tweets about Hurricane Sandy over time. “[Y]ou can see how the whole country lights up and how tweets don’t just move linearly up the coast as the storm progresses, capturing the advance impact of such a large storm and its peripheral effects across the country” (4).


These social media maps don’t only “work” at the country level or for Western industrialized states. Take the following map of Jakarta made almost exclusively from geo-tagged tweets. You can see the individual roads and arteries (nervous system). Granted, this map works so well because of the horrendous traffic but nevertheless a pattern emerges, one that is strongly correlated to the Jakarta’s road network. And unlike the map of the world at night, we can capture this pulse in real time and at a fraction of the cost.

Jakmap

Like any young nervous system, our social media system is still growing and evolving. But it is already adding value. The analysis of tweets predicts the flu better than the crunching of traditional data used by public health institutions, for example. And the analysis of tweets from Indonesia also revealed that Twitter data can be used to monitor food security in real-time.

The main problem I see about all this has much less to do with issues of selection bias and unrepresentative samples, etc. Far more problematic is the central-ization of this data and the fact that it is closed data. Yes, the above maps are public, but don’t be fooled, the underlying data is not. In their new study, “The Politics of Twitter Data,” Cornelius Puschmann and Jean Burgess argue that the “owners” of social media data are the platform providers, not the end users. Yes, access to Twitter.com and Twitter’s API is free but end users are limited to downloading just a few thousand tweets per day. (For comparative purposes, more than 20 million tweets were posted during Hurricane Sandy). Getting access to more data can cost hundreds of thousands of dollars. In other words, as Puschmann and Burgess note, “only corporate actors and regulators—who possess both the intellectual and financial resources to succeed in this race—can afford to participate,” which means “that the emerging data market will be shaped according to their interests.”

“Social Media: Pulse of the Planet?” Getting there, but only a few elite Doctors can take the full pulse in real-time.

Social Network Analysis for Digital Humanitarian Response

Monitoring social media for digital humanitarian response can be a massive undertaking. The sheer volume and velocity of tweets generated during a disaster makes real-time social media monitoring particularly challenging if not near impossible. However, two new studies argue that there is “a better way to track the spread of information on Twitter that is much more powerful.”

Twitter-Hadoop31

Manuel Garcia-Herranz and his team at the Autonomous University of Madrid in Spain use small groups of “highly connected Twitter users as ‘sensors’ to detect the emergence of new ideas. They point out that this works because highly co-nnected individuals are more likely to receive new ideas before ordinary users.” The test their hypothesis, the team studied 40 million Twitters users who “together totted up 1.5 billion follows’ and sent nearly half a billion tweets, including 67 million containing hashtags.”

They found that small groups of highly connected Twitter users detect “new hashtags about seven days earlier than the control group.  In fact, the lead time varied between nothing at all and as much as 20 days.” Manuel and his team thus argue that “there’s no point in crunching these huge data sets. You’re far better off picking a decent sensor group and watching them instead.” In other words, “your friends could act as an early warning system, not just for gossip, but for civil unrest and even outbreaks of disease.”

The second study, “Identifying and Characterizing User Communities on Twitter during Crisis Events,” (PDF) is authored by Aditi Gupta et al. Aditi and her co-lleagues analyzed three major crisis events (Hurricane Irene, Riots in England and Earthquake in Virginia) to “to identify the different user communities, and characterize them by the top central users.” Their findings are in line with those shared by the team in Madrid. “[T]he top users represent the topics and opinions of all the users in the community with 81% accuracy on an average.” In sum, “to understand a community, we need to monitor and analyze only these top users rather than all the users in a community.”

How could these findings be used to prioritize the monitoring of social media during disasters? See this blog post for more on the use of social network analysis (SNA) for humanitarian response.

How to Create Resilience Through Big Data

Revised! I have edited this article several dozen times since posting the initial draft. I have also made a number of substantial changes to the flow of the article after discovering new connections, synergies and insights. In addition, I  have greatly benefited from reader feedback as well as the very rich conversa-tions that took place during the PopTech & Rockefeller workshop—a warm thank you to all participants for their important questions and feedback!

Introduction

I’ve been invited by PopTech and the Rockefeller Foundation to give the opening remarks at an upcoming event on interdisciplinary dimensions of resilience, which is  being hosted at Georgetown University. This event is connected to their new program focus on “Creating Resilience Through Big Data.” I’m absolutely de-lighted to be involved and am very much looking forward to the conversations. The purpose of this blog post is to summarize the presentation I intend to give and to solicit feedback from readers. So please feel free to use the comments section below to share your thoughts. My focus is primarily on disaster resilience. Why? Because understanding how to bolster resilience to extreme events will provide insights on how to also manage less extreme events, while the converse may not be true.

Big Data Resilience

terminology

One of the guiding questions for the meeting is this: “How do you understand resilience conceptually at present?” First, discourse matters.  The term resilience is important because it focuses not on us, the development and disaster response community, but rather on local at-risk communities. While “vulnerability” and “fragility” were used in past discourse, these terms focus on the negative and seem to invoke the need for external protection, overlooking the fact that many local coping mechanisms do exist. From the perspective of this top-down approach, international organizations are the rescuers and aid does not arrive until these institutions mobilize.

In contrast, the term resilience suggests radical self-sufficiency, and self-sufficiency implies a degree of autonomy; self-dependence rather than depen-dence on an external entity that may or may not arrive, that may or may not be effective, and that may or may not stay the course. The term “antifragile” just recently introduced by Nassim Taleb also appeals to me. Antifragile sys-tems thrive on disruption. But lets stick with the term resilience as anti-fragility will be the subject of a future blog post, i.e., I first need to finish reading Nassim’s book! I personally subscribe to the following definition of resilience: the capacity for self-organization; and shall expand on this shortly.

(See the Epilogue at the end of this blog post on political versus technical defini-tions of resilience and the role of the so-called “expert”. And keep in mind that poverty, cancer, terrorism etc., are also resilient systems. Hint: we have much to learn from pernicious resilience and the organizational & collective action models that render those systems so resilient. In their book on resilience, Andrew Zolli and Ann Marie Healy note the strong similarities between Al-Qaeda & tuber-culosis, one of which are the two systems’ ability to regulate their metabolism).

Hazards vs Disasters

In the meantime, I first began to study the notion of resilience from the context of complex systems and in particular the field of ecology, which defines resilience as “the capacity of an ecosystem to respond to a perturbation or disturbance by resisting damage and recovering quickly.” Now lets unpack this notion of perturbation. There is a subtle but fundamental difference between disasters (processes) and hazards (events); a distinction that Jean-Jacques Rousseau first articulated in 1755 when Portugal was shaken by an earthquake. In a letter to Voltaire one year later, Rousseau notes that, “nature had not built [process] the houses which collapsed and suggested that Lisbon’s high population density [process] contributed to the toll” (1). In other words, natural events are hazards and exogenous while disas-ters are the result of endogenous social processes. As Rousseau added in his note to Voltaire, “an earthquake occurring in wilderness would not be important to society” (2). That is, a hazard need not turn to disaster since the latter is strictly a product or calculus of social processes (structural violence).

And so, while disasters were traditionally perceived as “sudden and short lived events, there is now a tendency to look upon disasters in African countries in particular, as continuous processes of gradual deterioration and growing vulnerability,” which has important “implications on the way the response to disasters ought to be made” (3). (Strictly speaking, the technical difference between events and processes is one of scale, both temporal and spatial, but that need not distract us here). This shift towards disasters as processes is particularly profound for the creation of resilience, not least through Big Data. To under-stand why requires a basic introduction to complex systems.

complex systems

All complex systems tend to veer towards critical change. This is explained by the process of Self-Organized Criticality (SEO). Over time, non-equilibrium systems with extended degrees of freedom and a high level of nonlinearity become in-creasingly vulnerable to collapse. Social, economic and political systems certainly qualify as complex systems. As my “alma mater” the Santa Fe Institute (SFI) notes, “The archetype of a self-organized critical system is a sand pile. Sand is slowly dropped onto a surface, forming a pile. As the pile grows, avalanches occur which carry sand from the top to the bottom of the pile” (4). That is, the sand pile becomes increasingly unstable over time.

Consider an hourglass or sand clock as an illustration of self-organized criticality. Grains of sand sifting through the narrowest point of the hourglass represent individual events or natural hazards. Over time a sand pile starts to form. How this process unfolds depends on how society chooses to manage risk. A laisser-faire attitude will result in a steeper pile. And grain of sand falling on an in-creasingly steeper pile will eventually trigger an avalanche. Disaster ensues.

Why does the avalanche occur? One might ascribe the cause of the avalanche to that one grain of sand, i.e., a single event. On the other hand, a complex systems approach to resilience would associate the avalanche with the pile’s increasing slope, a historical process which renders the structure increasingly vulnerable to falling grains. From this perspective, “all disasters are slow onset when realisti-cally and locally related to conditions of susceptibility”. A hazard event might be rapid-onset, but the disaster, requiring much more than a hazard, is a long-term process, not a one-off event. The resilience of a given system is therefore not simply dependent on the outcome of future events. Resilience is the complex product of past social, political, economic and even cultural processes.

dealing with avalanches

Scholars like Thomas Homer-Dixon argue that we are becoming increasingly prone to domino effects or cascading changes across systems, thus increasing the likelihood of total synchronous failure. “A long view of human history reveals not regular change but spasmodic, catastrophic disruptions followed by long periods of reinvention and development.” We must therefore “reduce as much as we can the force of the underlying tectonic stresses in order to lower the risk of synchro-nous failure—that is, of catastrophic collapse that cascades across boundaries between technological, social and ecological systems” (5).

Unlike the clock’s lifeless grains of sand, human beings can adapt and maximize their resilience to exogenous shocks through disaster preparedness, mitigation and adaptation—which all require political will. As a colleague of mine recently noted, “I wish it were widely spread amongst society  how important being a grain of sand can be.” Individuals can “flatten” the structure of the sand pile into a less hierarchical but more resilience system, thereby distributing and diffusing the risk and size of an avalanche. Call it distributed adaptation.

operationalizing resilience

As already, the field of ecology defines  resilience as “the capacity of an ecosystem to respond to a perturbation or disturbance by resisting damage and recovering quickly.” Using this understanding of resilience, there are at least 2 ways create more resilient “social ecosystems”:

  1. Resist damage by absorbing and dampening the perturbation.
  2. Recover quickly by bouncing back or rather forward.

Resisting Damage

So how does a society resist damage from a disaster? As hinted earlier, there is no such thing as a “natural” disaster. There are natural hazards and there are social systems. If social systems are not sufficiently resilient to absorb the impact of a natural hazard such as an earthquake, then disaster unfolds. In other words, hazards are exogenous while disasters are the result of endogenous political, economic, social and cultural processes. Indeed, “it is generally accepted among environmental geographers that there is no such thing as a natural disaster. In every phase and aspect of a disaster—causes, vulnerability, preparedness, results and response, and reconstruction—the contours of disaster and the difference between who lives and dies is to a greater or lesser extent a social calculus” (6).

So how do we apply this understanding of disasters and build more resilient communities? Focusing on people-centered early warning systems is one way to do this. In 2006, the UN’s International Strategy for Disaster Reduction (ISDR) recognized that top-down early warning systems for disaster response were increasingly ineffective. They thus called for a more bottom-up approach in the form of people-centered early warning systems. The UN ISDR’s Global Survey of Early Warning Systems (PDF), defines the purpose of people-centered early warning systems as follows:

“… to empower individuals and communities threatened by hazards to act in sufficient time and in an appropriate manner so as to reduce the possibility of personal injury, loss of life, damage to property and the environment, and loss of livelihoods.”

Information plays a central role here. Acting in sufficient time requires having timely information about (1) the hazard/s, (2) our resilience and (3) how to respond. This is where information and communication technologies (ICTs), social media and Big Data play an important role. Take the latter, for example. One reason for the considerable interest in Big Data is prediction and anomaly detection. Weather and climatic sensors provide meteorologists with the copious amounts of data necessary for the timely prediction of weather patterns and  early detection of atmospheric hazards. In other words, Big Data Analytics can be used to anticipate the falling grains of sand.

Now, predictions are often not correct. But the analysis of Big Data can also help us characterize the sand pile itself, i.e., our resilience, along with the associated trends towards self-organized criticality. Recall that complex systems tend towards instability over time (think of the hourglass above). Thanks to ICTs, social media and Big Data, we now have the opportunity to better characterize in real-time the social, economic and political processes driving our sand pile. Now, this doesn’t mean that we have a perfect picture of the road to collapse; simply that our picture is clearer than ever before in human history. In other words, we can better measure our own resilience. Think of it as the Quantified Self move-ment applied to an entirely different scale, that of societies and cities. The point is that Big Data can provide us with more real-time feedback loops than ever before. And as scholars of complex systems know, feedback loops are critical for adaptation and change. Thanks to social media, these loops also include peer-to-peer feedback loops.

An example of monitoring resilience in real-time (and potentially anticipating future changes in resilience) is the UN Global Pulse’s project on food security in Indonesia. They partnered with Crimson Hexagon to forecast food prices in Indonesia by analyzing tweets referring to the price of rice. They found an inter-esting relationship between said tweets and government statistics on food price inflation. Some have described the rise of social media as a new nervous system for the planet, capturing the pulse of our social systems. My colleagues and I at QCRI are therefore in the process of appling this approach to the study of the Arabic Twittersphere. Incidentally, this is yet another critical reason why Open Data is so important (check out the work of OpenDRI, Open Data for Resilience Initiative. See also this post on Demo-cratizing ICT for Development with DIY Innovation and Open Data). More on open data and data philanthropy in the conclusion.

Finally, new technologies can also provide guidance on how to respond. Think of Foursquare but applied to disaster response. Instead of “Break Glass in Case of Emergency,” how about “Check-In in Case of Emergency”? Numerous smart-phone apps such as Waze already provide this kind of at-a-glance, real-time situational awareness. It is only a matter of time until humanitarian organiza-tions develop disaster response apps that will enable disaster-affected commu-nities to check-in for real time guidance on what to do given their current location and level of resilience. Several disaster preparedness apps already exist. Social computing and Big Data Analytics can power these apps in real-time.

Quick Recovery

As already noted, there are at least two ways create more resilient “social eco-systems”. We just discussed the first: resisting damage by absorbing and dam-pening the perturbation.  The second way to grow more resilient societies is by enabling them to rapidly recover following a disaster.

As Manyena writes, “increasing attention is now paid to the capacity of disaster-affected communities to ‘bounce back’ or to recover with little or no external assistance following a disaster.” So what factors accelerate recovery in eco-systems in general? In ecological terms, how quickly the damaged part of an ecosystem can repair itself depends on how many feedback loops it has to the non- (or less-) damaged parts of the ecosystem(s). These feedback loops are what enable adaptation and recovery. In social ecosystems, these feedback loops can be comprised of information in addition to the transfer of tangible resources.  As some scholars have argued, a disaster is first of all “a crisis in communicating within a community—that is, a difficulty for someone to get informed and to inform other people” (7).

Improving ways for local communities to communicate internally and externally is thus an important part of building more resilient societies. Indeed, as Homer-Dixon notes, “the part of the system that has been damaged recovers by drawing resources and information from undamaged parts.” Identifying needs following a disaster and matching them to available resources is an important part of the process. Indeed, accelerating the rate of (1) identification; (2) matching and, (3) allocation, are important ways to speed up overall recovery.

This explains why ICTs, social media and Big Data are central to growing more resilient societies. They can accelerate impact evaluations and needs assessments at the local level. Population displacement following disasters poses a serious public health risk. So rapidly identifying these risks can help affected populations recover more quickly. Take the work carried out by my colleagues at Flowminder, for example. They  empirically demonstrated that mobile phone data (Big Data!) can be used to predict population displacement after major disasters. Take also this study which analyzed call dynamics to demonstrate that telecommunications data could be used to rapidly assess the impact of earthquakes. A related study showed similar results when analyzing SMS’s and building damage Haiti after the 2010 earthquake.

haiti_overview_570

Resilience as Self-Organization and Emergence

Connection technologies such as mobile phones allow individual “grains of sand” in our societal “sand pile” to make necessary connections and decisions to self-organize and rapidly recover from disasters. With appropriate incentives, pre-paredness measures and policies, these local decisions can render a complex system more resilient. At the core here is behavior change and thus the importance of understanding behavior change models. Recall  also Thomas Schelling’s observation that micro-motives can lead to macro-behavior. To be sure, as Thomas Homer-Dixon rightly notes, “Resilience is an emergent property of a system—it’s not a result of any one of the system’s parts but of the synergy between all of its parts.  So as a rough and ready rule, boosting the ability of each part to take care of itself in a crisis boosts overall resilience.” (For complexity science readers, the notions of transforma-tion through phase transitions is relevant to this discussion).

In other words, “Resilience is the capacity of the affected community to self-organize, learn from and vigorously recover from adverse situations stronger than it was before” (8). This link between resilience and capacity for self-organization is very important, which explains why a recent and major evaluation of the 2010 Haiti Earthquake disaster response promotes the “attainment of self-sufficiency, rather than the ongoing dependency on standard humanitarian assistance.” Indeed, “focus groups indicated that solutions to help people help themselves were desired.”

The fact of the matter is that we are not all affected in the same way during a disaster. (Recall the distinction between hazards and disasters discussed earlier). Those of use who are less affected almost always want to help those in need. Herein lies the critical role of peer-to-peer feedback loops. To be sure, the speed at which the damaged part of an ecosystem can repair itself depends on how many feedback loops it has to the non- (or less-) damaged parts of the eco-system(s). These feedback loops are what enable adaptation and recovery.

Lastly, disaster response professionals cannot be every where at the same time. But the crowd is always there. Moreover, the vast majority of survivals following major disasters cannot be attributed to external aid. One study estimates that at most 10% of external aid contributes to saving lives. Why? Because the real first responders are the disaster-affected communities themselves, the local popula-tion. That is, the real first feedback loops are always local. This dynamic of mutual-aid facilitated by social media is certainly not new, however. My colleagues in Russia did this back in 2010 during the major forest fires that ravaged their country.

While I do have a bias towards people-centered interventions, this does not mean that I discount the importance of feedback loops to external actors such as traditional institutions and humanitarian organizations. I also don’t mean to romanticize the notion of “indigenous technical knowledge” or local coping mechanism. Some violate my own definition of human rights, for example. However, my bias stems from the fact that I am particularly interested in disaster resilience within the context of areas of limited statehood where said institutions and organizations are either absent are ineffective. But I certainly recognize the importance of scale jumping, particularly within the context of social capital and social media.

RESILIENCE THROUGH SOCIAL CAPITAL

Information-based feedback loops general social capital, and the latter has been shown to improve disaster resilience and recovery. In his recent book entitled “Building Resilience: Social Capital in Post-Disaster Recovery,” Daniel Aldrich draws on both qualitative and quantitative evidence to demonstrate that “social resources, at least as much as material ones, prove to be the foundation for resilience and recovery.” His case studies suggest that social capital is more important for disaster resilience than physical and financial capital, and more important than conventional explanations. So the question that naturally follows given our interest in resilience & technology is this: can social media (which is not restricted by geography) influence social capital?

Social Capital

Building on Daniel’s research and my own direct experience in digital humani-tarian response, I argue that social media does indeed nurture social capital during disasters. “By providing norms, information, and trust, denser social networks can implement a faster recovery.” Such norms also evolve on Twitter, as does information sharing and trust building. Indeed, “social ties can serve as informal insurance, providing victims with information, financial help and physical assistance.” This informal insurance, “or mutual assistance involves friends and neighbors providing each other with information, tools, living space, and other help.” Again, this bonding is not limited to offline dynamics but occurs also within and across online social networks. Recall the sand pile analogy. Social capital facilitates the transformation of the sand pile away (temporarily) from self-organized criticality. On a related note vis-a-vis open source software, “the least important part of open source software is the code.” Indeed, more important than the code is the fact that open source fosters social ties, networks, communities and thus social capital.

(Incidentally, social capital generated during disasters is social capital that can subsequently be used to facilitate self-organization for non-violent civil resistance and vice versa).

RESILIENCE through big data

My empirical research on tweets posted during disasters clearly shows that while many use twitter (and social media more generally) to post needs during a crisis, those who are less affected in the social ecosystem will often post offers to help. So where does Big Data fit into this particular equation? When disaster strikes, access to information is equally important as access to food and water. This link between information, disaster response and aid was officially recognized by the Secretary General of the International Federation of Red Cross & Red Crescent Societies in the World Disasters Report published in 2005. Since then, disaster-affected populations have become increasingly digital thanks to the very rapid and widespread adoption of mobile technologies. Indeed, as a result of these mobile technologies, affected populations are increasingly able to source, share and generate a vast amount of information, which is completely transforming disaster response.

In other words, disaster-affected communities are increasingly becoming the source of Big (Crisis) Data during and following major disasters. There were over 20 million tweets posted during Hurricane Sandy. And when the major earth-quake and Tsunami hit Japan in early 2011, over 5,000 tweets were being posted every secondThat is 1.5 million tweets every 5 minutes. So how can Big Data Analytics create more resilience in this respect? More specifically, how can Big Data Analytics accelerate disaster recovery? Manually monitoring millions of tweets per minute is hardly feasible. This explains why I often “joke” that we need a local Match.com for rapid disaster recovery. Thanks to social computing, artifi-cial intelligence, machine learning and Big Data Analytics, we can absolutely develop a “Match.com” for rapid recovery. In fact, I’m working on just such a project with my colleagues at QCRI. We are also developing algorithms to auto-matically identify informative and actionable information shared on Twitter, for example. (Incidentally, a by-product of developing a robust Match.com for disaster response could very well be an increase in social capital).

There are several other ways that advanced computing can create disaster resilience using Big Data. One major challenge is digital humanitarian response is the verification of crowdsourced, user-generated content. Indeed, misinforma-tion and rumors can be highly damaging. If access to information is tantamount to food access as noted by the Red Cross, then misinformation is like poisoned food. But Big Data Analytics has already shed some light on how to develop potential solutions. As it turns out, non-credible disaster information shared on Twitter propagates differently than credible information, which means that the credibility of tweets could be predicted automatically.

Conclusion

In sum, “resilience is the critical link between disaster and development; monitoring it [in real-time] will ensure that relief efforts are supporting, and not eroding […] community capabilities” (9). While the focus of this blog post has been on disaster resilience, I believe the insights provided are equally informa-tive for less extreme events.  So I’d like to end on two major points. The first has to do with data philanthropy while the second emphasizes the critical importance of failing gracefully.

Big Data is Closed and Centralized

A considerable amount of “Big Data” is Big Closed and Centralized Data. Flow-minder’s study mentioned above draws on highly proprietary telecommunica-tions data. Facebook data, which has immense potential for humanitarian response, is also closed. The same is true of Twitter data, unless you have millions of dollars to pay for access to the full Firehose, or even Decahose. While access to the Twitter API is free, the number of tweets that can be downloaded and analyzed is limited to several thousand a day. Contrast this with the 5,000 tweets per second posted after the earthquake and Tsunami in Japan. We therefore need some serious political will from the corporate sector to engage in “data philanthropy”. Data philanthropy involves companies sharing proprietary datasets for social good. Call it Corporate Social Responsibility (CRS) for digital humanitarian response. More here on how this would work.

Failing Gracefully

Lastly, on failure. As noted, complex systems tend towards instability, i.e., self-organized criticality, which is why Homer-Dixon introduces the notion of failing gracefully. “Somehow we have to find the middle ground between dangerous rigidity and catastrophic collapse.” He adds that:

“In our organizations, social and political systems, and individual lives, we need to create the possibility for what computer programmers and disaster planners call ‘graceful’ failure. When a system fails gracefully, damage is limited, and options for recovery are preserved. Also, the part of the system that has been damaged recovers by drawing resources and information from undamaged parts.” Homer-Dixon explains that “breakdown is something that human social systems must go through to adapt successfully to changing conditions over the long term. But if we want to have any control over our direction in breakdown’s aftermath, we must keep breakdown constrained. Reducing as much as we can the force of underlying tectonic stresses helps, as does making our societies more resilient. We have to do other things too, and advance planning for breakdown is undoubtedly the most important.”

As Louis Pasteur famously noted, “Chance favors the prepared mind.” Preparing for breakdown is not defeatist or passive. Quite on the contrary, it is wise and pro-active. Our hubris—including our current infatuation with Bid Data—all too often clouds our better judgment. Like Macbeth, rarely do we seriously ask our-selves what we would do “if we should fail.” The answer “then we fail” is an option. But are we truly prepared to live with the devastating consequences of total synchronous failure?

In closing, some lingering (less rhetorical) questions:

  • How can resilience can be measured? Is there a lowest common denominator? What is the “atom” of resilience?
  • What are the triggers of resilience, creative capacity, local improvisation, regenerative capacity? Can these be monitored?
  • Where do the concepts of “lived reality” and “positive deviance” enter the conversation on resilience?
  • Is resiliency a right? Do we bear a responsibility to render systems more resilient? If so, recalling that resilience is the capacity to self-organize, do local communities have the right to self-organize? And how does this differ from democratic ideals and freedoms?
  • Recent research in social-psychology has demonstrated that mindfulness is an amplifier of resilience for individuals? How can be scaled up? Do cultures and religions play a role here?
  • Collective memory influences resilience. How can this be leveraged to catalyze more regenerative social systems?

bio

Epilogue: Some colleagues have rightfully pointed out that resilience is ultima-tely political. I certainly share that view, which is why this point came up in recent conversations with my PopTech colleagues Andrew Zolli & Leetha Filderman. Readers of my post will also have noted my emphasis on distinguishing between hazards and disasters; that the latter are the product of social, economic and political processes. As noted in my blog post, there are no natural disastersTo this end, some academics rightly warn that “Resilience is a very technical, neutral, apolitical term. It was initially designed to characterize systems, and it doesn’t address power, equity or agency…  Also, strengthening resilience is not free—you can have some winners and some losers.”

As it turns out, I have a lot say about the political versus technical argument. First of all, this is hardly a new or original argument but nevertheless an important one. Amartya Senn discussed this issue within the context of famines decades ago, noting that famines do not take place in democracies. In 1997, Alex de Waal published his seminal book, “Famine Crimes: Politics and the Disaster Relief In-dustry in Africa.” As he rightly notes, “Fighting famine is both a technical and political challenge.” Unfortunately, “one universal tendency stands out: technical solutions are promoted at the expense of political ones.” There is also a tendency to overlook the politics of technical actions, muddle or cover political actions with technical ones, or worse, to use technical measures as an excuse not to undertake needed political action.

De Waal argues that the use of the term “governance” was “an attempt to avoid making the political critique too explicit, and to enable a focus on specific technical aspects of government.” In some evaluations of development and humanitarian projects, “a caveat is sometimes inserted stating that politics lies beyond the scope of this study.” To this end, “there is often a weak call for ‘political will’ to bridge the gap between knowledge of technical measures and action to implement them.” As de Waal rightly notes, “the problem is not a ‘missing link’ but rather an entire political tradition, one manifestation of which is contemporary international humanitarianism.” In sum, “technical ‘solutions’ must be seen in the political context, and politics itself in the light of the domi-nance of a technocratic approach to problems such as famine.”

From a paper I presented back in 2007: “the technological approach almost always serves those who seek control from a distance.” As a result of this technological drive for pole position, a related “concern exists due to the separation of risk evaluation and risk reduction between science and political decision” so that which is inherently politically complex becomes depoliticized and mechanized. In Toward a Rational Society (1970), the German philosopher Jürgen Habermas describes “the colonization of the public sphere through the use of instrumental technical rationality. In this sphere, complex social problems are reduced to technical questions, effectively removing the plurality of contending perspectives.”

To be sure, Western science tends to pose the question “How?” as opposed to “Why?”What happens then is that “early warning systems tend to be largely conceived as hazard-focused, linear, topdown, expert driven systems, with little or no engagement of end-users or their representatives.” As De Waal rightly notes, “the technical sophistication of early warning systems is offset by a major flaw: response cannot be enforced by the populace. The early warning information is not normally made public.”  In other words, disaster prevention requires “not merely identifying causes and testing policy instruments but building a [social and] political movement” since “the framework for response is inherently political, and the task of advocacy for such response cannot be separated from the analytical tasks of warning.”

Recall my emphasis on people-centered early warning above and the definition of resilience as capacity for self-organization. Self-organization is political. Hence my efforts to promote greater linkages between the fields of nonviolent action and early warning years ago. I have a paper (dated 2008) specifically on this topic should anyone care to read. Anyone who has read my doctoral dissertation will also know that I have long been interested in the impact of technology on the balance of power in political contexts. A relevant summary is available here. Now, why did I not include all this in the main body of my blog post? Because this updated section already runs over 1,000 words.

In closing, I disagree with the over-used criticism that resilience is reactive and about returning to initial conditions. Why would we want to be reactive or return to initial conditions if the latter state contributed to the subsequent disaster we are recovering from? When my colleague Andrew Zolli talks about resilience, he talks about “bouncing forward”, not bouncing back. This is also true of Nassim Taleb’s term antifragility, the ability to thrive on disruption. As Homer-Dixon also notes, preparing to fail gracefully is hardly reactive either.

Tweeting is Believing? Analyzing Perceptions of Credibility on Twitter

What factors influence whether or not a tweet is perceived as credible? According to this recent study, users have “difficulty discerning truthfulness based on con-tent alone, with message topic, user name, and user image all impacting judg-ments of tweets and authors to varying degrees regardless of the actual truth-fulness of the item.”

For example, “Features associated with low credibility perceptions were the use of non-standard grammar and punctuation, not replacing the default account image, or using a cartoon or avatar as an account image. Following a large number of users was also associated with lower author credibility, especially when unbalanced in comparison to follower count […].” As for features enhan-cing a tweet’s credibility, these included “author influence (as measured by follower, retweet, and  mention counts), topical expertise (as established through a Twitter homepage bio, history of on-topic tweeting, pages outside of Twitter, or having a location relevant to the topic of the tweet), and reputation (whether an author is someone a user follows, has heard of, or who has an official Twitter account verification seal). Content related features viewed as credibility-enhancing were containing a URL leading to a high-quality site, and the existence of other tweets conveying similar information.”

 In general, users’ ability to “judge credibility in practice is largely limited to those features visible at-a-glance in current UIs (user picture, user name, and tweet content). Conversely, features that often are obscured in the user interface, such as the bio of a user, receive little attention despite their ability to impact cred-ibility judgments.” The table below compares a features’s perceived credibility impact with the attention actually allotted to assessing that feature.

“Message topic influenced perceptions of tweet credibility, with science tweets receiving a higher mean tweet credibility rating than those about either politics  or entertainment. Message topic had no statistically significant impact on perceptions of author credibility.” In terms of usernames, “Authors with topical names were considered more credible than those with traditional user names, who were in turn considered more credible than those with internet name styles.” In a follow up experiment, the study analyzed perceptions of credibility vis-a-vis a user’s image, i.e., the profile picture associated with a given Twitter account. “Use of the default Twitter icon significantly lowers ratings of content and marginally lowers ratings of authors […]” in comparison to generic, topical, female and male images.

Obviously, “many of these metrics can be faked to varying extents. Selecting a topical username is trivial for a spam account. Manufacturing a high follower to following ratio or a high number of retweets is more difficult but not impossible. User interface changes that highlight harder to fake factors, such as showing any available relationship between a user’s network and the content in question, should help.” Overall, these results “indicate a discrepancy between features people rate as relevant to determining credibility and those that mainstream social search engines make available.” The authors of the study conclude by suggesting changes in interface design that will enhance a user’s ability to make credibility judgements.

“Firstly, author credentials should be accessible at a glance, since these add value and users rarely take the time to click through to them. Ideally this will include metrics that convey consistency (number of tweets on topic) and legitimization by other users (number of mentions or retweets), as well as details from the author’s Twitter page (bio, location, follower/following counts). Second, for con-tent assessment, metrics on number of retweets or number of times a link has been shared, along with who is retweeting and sharing, will provide consumers with context for assessing credibility. […] seeing clusters of tweets that conveyed similar messages was reassuring to users; displaying such similar clusters runs counter to the current tendency for search engines to strive for high recall by showing a diverse array of retrieved items rather than many similar ones–exploring how to resolve this tension is an interesting area for future work.”

In sum, the above findings and recommendations explain why platforms such as RapportiveSeriously Rapid Source Review (SRSR) and CrisisTracker add so much value to the process of assessing the credibility of tweets in near real-time. For related research: Predicting the Credibility of Disaster Tweets Automatically and: Automatically Ranking the Credibility of Tweets During Major Events.