Category Archives: Big Data

Automatically Identifying Eyewitness Reporters on Twitter During Disasters

My colleague Kate Starbird recently shared a very neat study entitled “Learning from the Crowd: Collaborative Filtering Techniques for Identifying On-the-Ground Twitterers during Mass Disruptions” (PDF). As she and her co-authors rightly argue, “most Twitter activity during mass disruption events is generated by the remote crowd.” So can we use advanced computing to rapidly identify Twitter users who are reporting from ground zero? The answer is yes.

twitter-disaster-test

An important indicator of whether or not a Twitter user is reporting from the scene of a crisis is the number of times they are retweeted. During the Egyptian revolution in early 2011, “nearly 30% of highly retweeted Twitter users were physically present at those protest events.” Kate et al. drew on this insight to study tweets posted during the Occupy Wall Street (OWS) protests in September 2011. The authors manually analyzed a sample of more than 2,300 Twitter users to determine which were tweeting from the protests. They found that 4.5% of Twitter users in their sample were actually onsite. Using this dataset as training data, Kate et al. were able to develop a classifier that can automatically identify Twitter users reporting from the protests with an accuracy of just shy of 70%. I expect that more training data could very well help increase this accuracy score. 

In any event, “the information resulting from this or any filtering technique must be further combined with human judgment to assess its accuracy.” As the authors rightly note, “this ‘limitation’ fits well within an information space that is witnessing the rise of digital volunteer communities who monitor multiple data sources, including social media, looking to identify and amplify new information coming from the ground.” To be sure, “For volunteers like these, the use of techniques that increase the signal to noise ratio in the data has the potential to drastically reduce the amount of work they must do. The model that we have outlined does not result in perfect classification, but it does increase this signal-to-noise ratio substantially—tripling it in fact.”

I really hope that someone will leverage Kate’s important work to develop a standalone platform that automatically generates a list of Twitter users who are reporting from disaster-affected areas. This would be a very worthwhile contribution to the ecosystem of next-generation humanitarian technologies. In the meantime, perhaps QCRI’s Artificial Intelligence for Disaster Response (AIDR) platform will help digital humanitarians automatically identify tweets posted by eyewitnesses. I’m optimistic since we were able to create a machine learning classifier with an accuracy of 80%-90% for eyewitness tweets. More on this in our recent study

MOchin - talked to family

One question that remains is how to automatically identify tweets like the one above? This person is not an eyewitness but was likely on the phone with her family who are closer to the action. How do we develop a classifier to catch these “second-hand” eyewitness reports?

bio

Analyzing Fake Content on Twitter During Boston Marathon Bombings

As iRevolution readers already know, the application of Information Forensics to social media is one of my primary areas of interest. So I’m always on the lookout for new and related studies, such as this one (PDF), which was just published by colleagues of mine in India. The study by Aditi Gupta et al. analyzes fake content shared on Twitter during the Boston Marathon Bombings earlier this year.

bostonstrong

Gupta et al. collected close to 8 million unique tweets posted by 3.7 million unique users between April 15-19th, 2013. The table below provides more details. The authors found that rumors and fake content comprised 29% of the content that went viral on Twitter, while 51% of the content constituted generic opinions and comments. The remaining 20% relayed true information. Interestingly, approximately 75% of fake tweets were propagated via mobile phone devices compared to true tweets which comprised 64% of tweets posted via mobiles.

Table1 Gupta et al

The authors also found that many users with high social reputation and verified accounts were responsible for spreading the bulk of the fake content posted to Twitter. Indeed, the study shows that fake content did not travel rapidly during the first hour after the bombing. Rumors and fake information only goes viral after Twitter users with large numbers of followers start propagating the fake content. To this end, “determining whether some information is true or fake, based on only factors based on high number of followers and verified accounts is not possible in the initial hours.”

Gupta et al. also identified close to 32,000 new Twitter accounts created between April 15-19 that also posted at least one tweet about the bombings. About 20% (6,073 accounts) of these new accounts were subsequently suspended by Twitter. The authors found that 98.7% of these suspended accounts did not include the word Boston in their names and usernames. They also note that some of these deleted accounts were “quite influential” during the Boston tragedy. The figure below depicts the number of suspended Twitter accounts created in the hours and days following the blast.

Figure 2 Gupta et al

The authors also carried out some basic social network analysis of the suspended Twitter accounts. First, they removed from the analysis all suspended accounts that did not interact with each other, which left just 69 accounts. Next, they analyzed the network typology of these 69 accounts, which produced four distinct graph structures: Single Link, Closed Community, Star Typology and Self-Loops. These are displayed in the figure below (click to enlarge).

Figure 3 Gupta et al

The two most interesting graphs are the Closed Community and Star Typology graphs—the second and third graphs in the figure above.

Closed Community: Users that retweet and mention each other, forming a closed community as indicated by the high closeness centrality values produced by the social network analysis. “All these nodes have similar usernames too, all usernames have the same prefix and only numbers in the suffixes are different. This indicates that either these profiles were created by same or similar minded people for posting common propaganda posts.” Gupta et al. analyzed the content posted by these users and found that all were “tweeting the same propaganda and hate filled tweet.”

Star Typology: Easily mistakable for the authentic “BostonMarathon” Twitter account, the fake account “BostonMarathons” created plenty of confusion. Many users propagated the fake content posted by the BostonMarathons account. As the authors note, “Impersonation or creating fake profiles is a crime that results in identity theft and is punishable by law in many countries.”

The automatic detection of these network structures on Twitter may enable us to detect and counter fake content in the future. In the meantime, my colleagues and I at QCRI are collaborating with Aditi Gupta et al. to develop a “Credibility Plugin” for Twitter based on this analysis and earlier peer-reviewed research carried out by my colleague ChaTo. Stay tuned for updates.

Bio

See also:

  • Boston Bombings: Analyzing First 1,000 Seconds on Twitter [link]
  • Taking the Pulse of the Boston Bombings on Twitter [link]
  • Predicting the Credibility of Disaster Tweets Automatically [link]
  • Auto-Ranking Credibility of Tweets During Major Events [link]
  • Auto-Identifying Fake Images on Twitter During Disasters [link]
  • How to Verify Crowdsourced Information from Social Media [link]
  • Crowdsourcing Critical Thinking to Verify Social Media [link]

World Disaster Report: Next Generation Humanitarian Technology

This year’s World Disaster Report was just released this morning. I had the honor of authoring Chapter 3 on “Strengthening Humanitarian Information: The Role of Technology.” The chapter focuses on the rise of “Digital Humanitarians” and explains how “Next Generation Humanitarian Technology” is used to manage Big (Crisis) Data. The chapter complements the groundbreaking report “Humanitarianism in the Network Age” published by UN OCHA earlier this year.

The key topics addressed in the chapter include:

  • Big (Crisis) Data
  • Self-Organized Disaster Response
  • Crowdsourcing & Bounded Crowdsourcing
  • Verifying Crowdsourced Information
  • Volunteer & Technical Communities
  • Digital Humanitarians
  • Libya Crisis Map
  • Typhoon Pablo Crisis Map
  • Syria Crisis Map
  • Microtasking for Disaster Response
  • MicroMappers
  • Machine Learning for Disaster Response
  • Artificial Intelligence for Disaster Response (AIDR)
  • American Red Cross Digital Operations Center
  • Data Protection and Security
  • Policymaking for Humanitarian Technology

I’m particularly interested in getting feedback on this chapter, so feel free to pose any comments or questions you may have in the comments section below.

bio

See also:

  • What is Big (Crisis) Data? [link]
  • Humanitarianism in the Network Age [link]
  • Predicting Credibility of Disaster Tweets [link]
  • Crowdsourced Verification for Disaster Response [link]
  • MicroMappers: Microtasking for Disaster Response [link]
  • AIDR: Artificial Intelligence for Disaster Response [link]
  • Research Agenda for Next Generation Humanitarian Tech [link]

Humanitarian Crisis Computing 101

Disaster-affected communities are increasingly becoming “digital” communities. That is, they increasingly use mobile technology & social media to communicate during crises. I often refer to this user-generated content as Big (Crisis) Data. Humanitarian crisis computing seeks to rapidly identify informative, actionable and credible content in this growing stack of real-time information. The challenge is akin to finding the proverbial needle in the haystack since the vast majority of reports posted on social media is often not relevant for humanitarian response. This is largely a result of the demand versus supply problem described here.

bd0

In any event, the few “needles” of information that are relevant, can relay information that is vital and indeed-life saving for relief efforts—both traditional top-down efforts and more bottom-up grassroots efforts. When disaster strikes, we increasingly see social media traffic explode. We know there are important “pins” of relevant information hidden in this growing stack of information but how do we find them in real-time?

bd2

Humanitarian organizations are ill-equipped to managing the deluge of Big Crisis Data. They tend to sift through the stack of information manually, which means they aren’t able to process more than a small volume of information. This is represented by the dotted green line in the picture below. Big Data is often described as filter failure. Our manual filters cannot manage the large volume, velocity and variety of information posted on social media during disasters. So all the information above the dotted line, Big Data, is completely ignored.

bd3

This is where Advanced Computing comes in. Advanced Computing uses Human and Machine Computing to manage Big Data and reduce filter failure, thus allowing humanitarian organizations to process a larger volume, velocity and variety of crisis information in less time. In other words, Advanced Computing helps us push the dotted green line up the information stack.

bd4

In the early days of digital humanitarian response, we used crowdsourcing to search through the haystack of user-generated content posted during disasters. Note that said content can also include text messages (SMS), like in Haiti. Crowd-sourcing crisis information is not as much fun as the picture below would suggest, however. In fact, crowdsourcing crisis information was (and can still be) quite a mess and a big pain in the haystack. Needless to say, crowdsourcing is not the best filter to make sense of Big Crisis Data.

bd5

Recently, digital humanitarians have turned to microtasking crisis information as described here and here. The UK Guardian and Wired have also written about this novel shift from crowdsourcing to microtasking.

bd6

Microtasking basically turns a haystack into little blocks of stacks. Each micro-stack is then processed by one ore more digital humanitarian volunteers. Unlike crowdsourcing, a microtasking approach to filtering crisis information is highly scalable, which is why we recently launched MicroMappers.

bd7

The smaller the micro-stack, the easier the tasks and the faster that they can be carried out by a greater number of volunteers. For example, instead of having 10 people classify 10,000 tweets based on the Cluster System, microtasking makes it very easy for 1,000 people to classify 10 tweets each. The former would take hours while the latter mere minutes. In response to the recent earthquake in Pakistan, some 100 volunteers used MicroMappers to classify 30,000+ tweets in about 30 hours, for example.

bd8

Machine Computing, in contrast, uses natural language processing (NLP) and machine learning (ML) to “quantify” the haystack of user-generated content posted on social media during disasters. This enable us to automatically identify relevant “needles” of information.

bd9

An example of a Machine Learning approach to crisis computing is the Artificial Intelligence for Disaster Response (AIDR) platform. Using AIDR, users can teach the platform to automatically identify relevant information from Twitter during disasters. For example, AIDR can be used to automatically identify individual tweets that relay urgent needs from a haystack of millions of tweets.

bd11
The pictures above are taken from the slide deck I put together for a keynote address I recently gave at the Canadian Ministry of Foreign Affairs.

bio

Hashtag Analysis of #Westgate Crisis Tweets

In July 2013, my team and I at QCRI launched this dashboard to analyze hashtags used by Twitter users during crises. Our first case study, which is available here, focused on Hurricane Sandy. Since then, both the UN and Greenpeace have also made use of the dashboard to analyze crisis tweets.

QCRI_Dashboard

We just uploaded 700,000+ Westgate related tweets to the dashboard. The results are available here and also displayed above. The dashboard is still under development, so we very much welcome feedback on how to improve it for future analysis. You can upload your own tweets to the dashboard if you’d like to test drive the platform.

Bio

See also: Forensics Analysis of #Westgate Tweets (Link)

AIDR: Artificial Intelligence for Disaster Response

Social media platforms are increasingly used to communicate crisis information when major disasters strike. Hence the rise of Big (Crisis) Data. Humanitarian organizations, digital humanitarians and disaster-affected communities know that some of this user-generated content can increase situational awareness. The challenge is to identify relevant and actionable content in near real-time to triangulate with other sources and make more informed decisions on the spot. Finding potentially life-saving information in this growing stack of Big Crisis Data, however, is like looking for the proverbial needle in a giant haystack. This is why my team and I at QCRI are developing AIDR.

haystpic_pic

The free and open source Artificial Intelligence for Disaster Response platform leverages machine learning to automatically identify informative content on Twitter during disasters. Unlike the vast majority of related platforms out there, we go beyond simple keyword search to filter for informative content. Why? Because recent research shows that keyword searches can miss over 50% of relevant content posted on Twitter. This is very far from optimal for emergency response. Furthermore, tweets captured via keyword search may not be relevant since words can have multiple meanings depending on context. Finally, keywords are restricted to one language only. Machine learning overcomes all these limitations, which is why we’re developing AIDR.

So how does AIDR work? There are three components of AIDR: the Collector, Trainer and Tagger. The Collector simply allows you to collect and save a collection of tweets posted during a disaster. You can download these tweets for analysis at any time and also use them to create an automated filter using machine learning, which is where the Trainer and Tagger come in. The Trainer allows one or more users to train the AIDR platform to automatically tag tweets of interest in a given collection of tweets. Tweets of interest could include those that refer to “Needs”, “Infrastructure Damage” or “Rumors” for example.

AIDR_Collector

A user creates a Trainer for tweets-of-interest by: 1) Creating a name for their Trainer, e.g., “My Trainer”; 2) Identifying topics of interest such as “Needs”, “Infrastructure Damage”,  “Rumors” etc. (as many topics as the user wants); and 3) Classifying tweets by topic of interest. This last step simply involves reading collected tweets and classifying them as “Needs”, “Infrastructure Damage”, “Rumor” or “Other,” for example. Any number of users can participate in classifying these tweets. That is, once a user creates a Trainer, she can classify the tweets herself, or invite her organization to help her classify, or ask the crowd to help classify the tweets, or all of the above. She simply shares a link to her training page with whoever she likes. If she choses to crowdsource the classification of tweets, AIDR includes a built-in quality control mechanism to ensure that the crowdsourced classification is accurate.

As noted here, we tested AIDR in response to the Pakistan Earthquake last week. We quickly hacked together the user interface displayed below, so functionality rather than design was our immediate priority. In any event, digital humanitarian volunteers from the Standby Volunteer Task Force (SBTF) tagged over 1,000 tweets based on the different topics (labels) listed below. As far as we know, this was the first time that a machine learning classifier was crowdsourced in the context of a humanitarian disaster. Click here for more on this early test.

AIDR_Trainer

The Tagger component of AIDR analyzes the human-classified tweets from the Trainer to automatically tag new tweets coming in from the Collector. This is where the machine learning kicks in. The Tagger uses the classified tweets to learn what kinds of tweets the user is interested in. When enough tweets have been classified (20 minimum), the Tagger automatically begins to tag new tweets by topic of interest. How many classified tweets is “enough”? This will vary but the more tweets a user classifies, the more accurate the Tagger will be. Note that each automatically tagged tweet includes an accuracy score—i.e., the probability that the tweet was correctly tagged by the automatic Tagger.

The Tagger thus displays a list of automatically tagged tweets updated in real-time. The user can filter this list by topic and/or accuracy score—display all tweets tagged as “Needs” with an accuracy of 90% or more, for example. She can also download the tagged tweets for further analysis. In addition, she can share the data link of her Tagger with developers so the latter can import the tagged tweets directly into to their own platforms, e.g., MicroMappers, Ushahidi, CrisisTracker, etc. (Note that AIDR already powers CrisisTracker by automating the classification of tweets). In addition, the user can share a display link with individuals who wish to embed the live feed into their websites, blogs, etc.

In sum, AIDR is an artificial intelligence engine developed to power consumer applications like MicroMappers. Any number of other tools can also be added to the AIDR platform, like the Credibility Plugin for Twitter that we’re collaborating on with partners in India. Added to AIDR, this plugin will score individual tweets based on the probability that they convey credible information. To this end, we hope AIDR will become a key node in the nascent ecosystem of next-generation humanitarian technologies. We plan to launch a beta version of AIDR at the 2013 CrisisMappers Conference (ICCM 2013) in Nairobi, Kenya this November.

In the meantime, we welcome any feedback you may have on the above. And if you want to help as an alpha tester, please get in touch so I can point you to the Collector tool, which you can start using right away. The other AIDR tools will be open to the same group of alpha tester in the coming weeks. For more on AIDR, see also this article in Wired.

AIDR_logo

The AIDR project is a joint collaboration with the United Nations Office for the Coordination of Humanitarian Affairs (OCHA). Other organizations that have expressed an interest in AIDR include the International Committee of the Red Cross (ICRC), American Red Cross (ARC), Federal Emergency Management Agency (FEMA), New York City’s Office for Emergency Management and their counterpart in the City of San Francisco. 

bio

Note: In the future, AIDR could also be adapted to take in Facebook status updates and text messages (SMS).

Developing MicroFilters for Digital Humanitarian Response

Filtering—or the lack thereof—presented the single biggest challenge when we tested MicroMappers last week in response to the Pakistan Earthquake. As my colleague Clay Shirky notes, the challenge with “Big Data” is not information overload but rather filter failure. We need to make damned sure that we don’t experience filter failure again in future deployments. To ensure this, I’ve decided to launch a stand-alone and fully interoperable platform called MicroFilters. My colleague Andrew Ilyas will lead the technical development of the platform with support from Ji Lucas. Our plan is to launch the first version of MicroFilters before the CrisisMappers conference (ICCM 2013) in November.

MicroFilters

A web-based solution, MicroFilters will allow users to upload their own Twitter data for automatic filtering purposes. Users will have the option of uploading this data using three different formats: text, CSV and JSON. Once uploaded, users can elect to perform one or more automatic filtering tasks from this menu of options:

[   ]  Filter out retweets
[   ]  Filter for unique tweets
[   ]  Filter tweets by language [English | Other | All]
[   ]  Filter for unique image links posted in tweets [Small | Medium | Large | All]
[   ]  Filter for unique video links posted in tweets [Short | Medium | Long | All]
[   ]  Filter for unique image links in news articles posted in tweets  [S | M | L | All]
[   ]  Filter for unique video links in news articles posted in tweets [S | M | L | All]

Note that “unique image and video links” refer to the long URLs not shortened URLs like bit.ly. After selecting the desired filtering option(s), the user simply clicks on the “Filter” button. Once the filtering is completed (a countdown clock is displayed to inform the user of the expected processing time), MicroFilters provides the user with a download link for the filtered results. The link remains live for 10 minutes after which the data is automatically deleted. If a CSV file was uploaded for filtering, the file format for download is also in CSV format; likewise for text and JSON files. Note that filtered tweets will appear in reverse chronological order (assuming time-stamp data was included in the uploaded file) when downloaded. The resulting file of filtered tweets can then be uploaded to MicroMappers within seconds.

In sum, MicroFilters will be invaluable for future deployments of MicroMappers. Solving the “filter failure” problem will enable digital humanitarians to process far more relevant data and in a more timely manner. Since MicroFilters will be a standalone platform, anyone else will also have access to these free and automatic filtering services. In the meantime, however, we very much welcome feedback, suggestions and offers of help, thank you!

bio

Seven Principles for Big Data and Resilience Projects

Authored by Kate Crawford, Patrick MeierClaudia PerlichAmy Luers, Gustavo Faleiros and Jer Thorp, 2013 PopTech & Rockefeller Foundation Bellagio Fellows

Update: See also “Big Data, Communities and Ethical Resilience: A Framework for Action” written by the above Fellows and available here (PDF).

Bellagio Fellows

The following is a draft “Code of Conduct” that seeks to provide guidance on best practices for resilience building projects that leverage Big Data and Advanced Computing. These seven core principles serve to guide data projects to ensure they are socially just, encourage local wealth- & skill-creation, require informed consent, and be maintainable over long timeframes. This document is a work in progress, so we very much welcome feedback. Our aim is not to enforce these principles on others but rather to hold ourselves accountable and in the process encourage others to do the same. Initial versions of this draft were written during the 2013 PopTech & Rockefeller Foundation workshop in Bellagio, August 2013.

1. Open Source Data Tools

Wherever possible, data analytics and manipulation tools should be open source, architecture independent and broadly prevalent (R, python, etc.). Open source, hackable tools are generative, and building generative capacity is an important element of resilience. Data tools that are closed prevent end-users from customizing and localizing them freely. This creates dependency on external experts which is a major point of vulnerability. Open source tools generate a large user base and typically have a wider open knowledge base. Open source solutions are also more affordable and by definition more transparent. Open Data Tools should be highly accessible and intuitive to use by non-technical users and those with limited technology access in order to maximize the number of participants who can independently use and analyze Big Data.

2. Transparent Data Infrastructure

Infrastructure for data collection and storage should operate based on transparent standards to maximize the number of users that can interact with the infrastructure. Data infrastructure should strive for built-in documentation, be extensive and provide easy access. Data is only as useful to the data scientist as her/his understanding of its collection is correct. This is critical for projects to be maintained over time, regardless of team membership, otherwise projects will collapse when key members leave. To allow for continuity, the infrastructure has to be transparent and clear to a broad set of analysts – independent of the tools they bring to bear. Solutions such as hadoop, JSON formats and the use of clouds are potentially suitable.

3. Develop and Maintain Local Skills

Make “Data Literacy” more widespread. Leverage local data labor and build on existing skills. The key and most constraint ingredient to effective data solutions remains human skill/knowledge and needs to be retained locally. In doing so, consider cultural issues and language. Catalyze the next generation of data scientists and generate new required skills in the cities where the data is being collected. Provide members of local communities with hands-on experience; people who can draw on local understanding and socio-cultural context. Longevity of Big Data for Resilience projects depends on the continuity of local data science teams that maintain an active knowledge and skills base that can be passed on to other local groups. This means hiring local researchers and data scientists and getting them to build teams of the best established talent, as well as up-and-coming developers and designers. Risks emerge when non-resident companies are asked to spearhead data programs that are connected to local communities. They bring in their own employees, do not foster local talent over the long-term, and extract value from the data and the learning algorithms that are kept by the company rather than the local community.

4. Local Data Ownership

Use Creative Commons and licenses that state that data is not to be used for commercial purposes. The community directly owns the data it generates, along with the learning algorithms (machine learning classifiers) and derivatives. Strong data protection protocols need to be in place to protect identities and personally identifying information. Only the “Principle of Do No Harm” can trump consent, as explicitly stated by the International Committee of the Red Cross’s Data Protection Protocols (ICRC 2013). While the ICRC’s data protection standards are geared towards humanitarian professionals, their core protocols are equally applicable to the use of Big Data in resilience projects. Time limits on how long the data can be used for should be transparently stated. Shorter frameworks should always be preferred, unless there are compelling reasons to do otherwise. People can give consent for how their data might be used in the short to medium term, but after that, the possibilities for data analytics, predictive modelling and de-anonymization will have advanced to a state that cannot at this stage be predicted, let alone consented to.

5. Ethical Data Sharing

Adopt existing data sharing protocols like the ICRC’s (2013). Permission for sharing is essential. How the data will be used should be clearly articulated. An opt in approach should be the preference wherever possible, and the ability for individuals to remove themselves from a data set after it has been collected must always be an option. Projects should always explicitly state which third parties will get access to data, if any, so that it is clear who will be able to access and use the data. Sharing with NGOs, academics and humanitarian agencies should be carefully negotiated, and only shared with for-profit companies when there are clear and urgent reasons to do so. In that case, clear data protection policies must be in place that will bind those third parties in the same way as the initial data gatherers. Transparency here is key: communities should be able to see where their data goes, and a complete list of who has access to it and why.

6. Right Not To Be Sensed

Local communities have a right not to be sensed. Large scale city sensing projects must have a clear framework for how people are able to be involved or choose not to participate. All too often, sensing projects are established without any ethical framework or any commitment to informed consent. It is essential that the collection of any sensitive data, from social and mobile data to video and photographic records of houses, streets and individuals, is done with full public knowledge, community discussion, and the ability to opt out. One proposal is the #NoShare tag. In essence, this principle seeks to place “Data Philanthropy” in the hands of local communities and in particular individuals. Creating clear informed consent mechanisms is a requisite for data philanthropy.

7. Learning from Mistakes

Big Data and Resilience projects need to be open to face, report, and discuss failures. Big Data technology is still very much in a learning phase. Failure and the learning and insights resulting from it should be accepted and appreciated. Without admitting what does not work we are not learning effectively as a community. Quality control and assessment for data-driven solutions is notably harder than comparable efforts in other technology fields. The uncertainty about quality of the solution is created by the uncertainty inherent in data. Even good data scientist are struggling to assess the upside potential of incremental efforts on the quality of a solution. The correct analogy is more one a craft rather a science. Similar to traditional crafts, the most effective way is to excellence is to learn from ones mistakes under the guidance of a mentor with a collective knowledge of experiences of both failure and success.

Yes, But Resilience for Whom?

I sense a little bit of history repeating, and not the good kind. About ten years ago, I was deeply involved in the field of conflict early warning and response. Eventually, I realized that the systems we were designing and implementing excluded at-risk communities even though the rhetoric had me believe they were instrumented to protect them. The truth is that these information systems were purely extractive and ultimately did little else than fill the pockets of academics who were hired as consultants to develop these early warning systems.

Future_PredictiveCoding

The prevailing belief amongst these academics was (and still is) that large datasets and advanced quantitative methodologies can predict the escalation of political tensions and thus impede violence. To be sure, “these systems have been developed in advanced environments where the intention is to gather data so as to predict events in distant places. This leads to a division of labor between those who ‘predict’ and those ‘predicted’ upon” (Cited Meier 2008, PDF).

Those who predict assume their sophisticated remote sensing systems will enable them to forecast and thus prevent impending conflict. Those predicted upon don’t even know these systems exist. The sum result? Conflict early warning systems have failed miserably at forecasting anything, let alone catalyzing preventive action or empowering local communities to get out of harm’s way. Conflict prevention is inherently political, and “political will is not an icon on your computer screen” (Cited in Meier 2013).

In Toward a Rational Society (1970), the German philosopher Jürgen Habermas describes “the colonization of the public sphere through the use of instrumental technical rationality. In this sphere, complex social problems are reduced to technical questions, effectively removing the plurality of contending perspectives” (Cited in Meier 2006, PDF). This instrumentalization of society depoliticized complex social problems like conflict and resilience into terms that are susceptible to technical solutions formulated by external experts. The participation of local communities thus becomes totally unnecessary to produce and deliver these technical solutions. To be sure, the colonization of the public sphere crowds out both local knowledge and participation.

We run this risk of repeating these mistakes with respect the discourse on community resilience. While we speak of community resilience, we gravitate towards the instrumentalization of communities using Big Data, which is largely conceived as a technical challenge of real-time data sensing and optimization. This external, top-down approach bars local participation. The depoliticization of resilience also hides the fact that “every act of measurement is an act marked by the play of powerful relations” (Cited Meier 2013b). To make matters worse, these measurements are almost always taken without the subjects knowing, let alone their consent. And so we create the division between those who sense and those sensed upon, thereby fully excluding the latter, all in the name of building community resilience.

Bio

Acknowledgements: I raised the question “Resilience for whom?” during the PopTech and Rockefeller Foundation workshop on “Big Data & Community Resilience.” I am thus grateful to the organizers and fellows for informing my thinking and the motivation for this post.

Big Data, Disaster Resilience and Lord of the Rings

The Shire is a local community of Hobbits seemingly disconnected from the systemic changes taking place in Middle Earth. They are a quiet, self-sufficient community with high levels of social capital. Hobbits are not interested in “Big Data”; their world is populated by “Small Data” and gentle action. This doesn’t stop the “Eye of Sauron” from sensing this small harmless hamlet, however. During Gandalf’s visit, the Hobbits learn that all is not well in the world outside the Shire. The changing climate, deforestation and land degradation is wholly unnatural and ultimately threatens their own way of life.

shire2

Gandalf leads a small band of Hobbits (bonding social capital) out of the Shire to join forces with other peoples of Middle Earth (bridging social capital) in what he calls “The Fellowship of the Ring” (resilience in diversity). Together, they must overcome personal & collective adversity and travel to Mordor to destroy the one ring that rules them all. Only then will Sauron’s “All Seeing Eye” cease sensing and oppressing the world of Middle Earth.

fellowship

I’m definitely no expert on J. R. R Tolken or The Lord of the Rings, but I’ve found that literature and indeed mythology often hold up important mirrors to our modern societies and remind us that the perils we face may not be entirely new. This implies that cautionary tales of the past may still bear some relevance today. The hero’s journey speaks to the human condition, and mythology serves as a evidence of human resilience. These narratives carry deep truths about the human condition, our shortcomings and redeeming qualities. Mythologies, analogies and metaphors help us make sense of our world; we ignore them at our own risk.

This is why I’ve employed the metaphor of the Shire (local communities) and Big Data (Eye of Sauron) during recent conversations on Big Data and Community Resilience. There’s been push-back of late against Big Data, with many promoting the notion of Small Data. “For many problems and questions, small data in itself is enough” (1). Yes, for specific problems: locally disconnected problems. But we live in an increasingly interdependent and connected world with coupled systems that run the risk of experiencing synchronous failure and collapse. Our sensors cannot be purely local since the resilience of our communities is no longer mostly place-based. This is where the rings come in.

eye_of_sauron

Frodo’s ring allows him to sense change well beyond the Shire and at the same time mask his local presence. But using the ring allows him to be sensed and hunted by Sauron. The same is true of Google and social media platforms like Facebook. We have no ways to opt out from being sensed if we wish to use these platforms. Community-generated content, our digital fingerprints, belong to the Great Eye, not to the Shire. This excellent piece on the Political Economy of Twitter clearly demonstrates that an elite few control user-generated content. The true owners of social media data are the platform providers, not the end users. In sum, “only corporate actors and regulators—who possess both the intellectual and financial resources to succeed in this race—can afford to participate,” which means “that the emerging data market will be shaped according to their interests.” Of course, the scandal surrounding PRISM makes Sauron’s “All Seeing Eye” even more palpable.

So when we say that we have more data than ever before in human history, it behooves us to ask “Who is we? And to what end?” Does the Shire have access to greater data than ever before thanks to Sauron? Hardly. Is this data used by Sauron to support community resilience? Fat chance. Local communities are excluded; they are observers, unwilling participants in a centralized system that ultimately undermines trust and their own resilience. Hobbits deserve the right not to be sensed. This should be a non-negotiable. They also deserve the right to own and manage their own “Small Data” themselves; that is, data generated by the community, for the community. We need respectful, people-centered data protection protocols like those developed by Open Paths. Community resilience ought to be ethical community resilience.

To be sure, we need to place individual data-sharing decisions in the hands of individuals rather than external parties. In addition to Open Paths, Creative Commons is an excellent example of what is possible. Why not extend that framework to personal and social media data? Why not include a temporal element to these licenses, as hinted in this blog post last year. That is, something like SnapChat where the user decides for herself how long the data should be accessible and usable. Well it turns out that these discussions and related conversations are taking place thanks to my fellow PopTech and Rockefeller Foundation Fellows. Stay tuned for updates. The ideas presented above are the result of our joint brainstorming sessions, and certainly not my ideas alone (but I take full blame for The Lord of the Rings analogy given my limited knowledge of said books!).

In closing, a final reference to The Lord of the Rings: Gandalf (who is a translational leader) didn’t empower the Hobbits, he took them on a journey that built on their existing capacities for resilience. That is, we cannot empower others, we can only provide them with the means to empower themselves. In sum, “Not all those who wander are lost.”

bio

ps. I’m hoping my talented fellows Kate Crawford, Gustavo Faleiros, Amy Luers, Claudia Perlich and Jer Thorp will chime in, improve my Lord of the Rings analogy and post comments in full Elvish script.