Category Archives: Social Media

Situational Awareness in Mass Emergency: Behavioral & Linguistic Analysis of Disaster Tweets

Sarah Vieweg‘s doctoral dissertation from the University of Colorado is a must-read for anyone interested in the use of twitter during crises. I read the entire 300-page study because it provides important insights on how automated natural language processing (NLP) can be applied to the Twittersphere to provide situational awareness following a sudden-onset emergency. Big thanks to Sarah for sharing her dissertation with QCRI. I include some excerpts below to highlight the most important findings from her excellent research.

Introduction

“In their research on human behavior in disaster, Fritz and Marks (1954) state: ‘[T]he immediate problem in a disaster situation is neither un-controlled behavior nor intense emotional reaction, but deficiencies of coordination and organization, complicated by people acting upon individual…definitions of the situation.'”

“Fritz and Marks’ assertion that people define disasters individually, which can lead to problematic outcomes, speaks to the need for common situational awareness among affected populations. Complete information is not attained during mass emergency, else it would not be a mass emergency. However, the more information people have and the better their situational awareness, and the better equipped they are to make tactical, strategic decisions.”

“[D]uring crises, people seek information from multiple sources in an attempt to make locally optimal decisions within given time constraints. The first objective, then, is to identify what tweets that contribute to situational awareness ‘look like’—i.e. what specific information do they contain? This leads to the next objective, which is to identify how information is communicated at a linguistic level. This process provides the foundation for tools that can automatically extract pertinent, valuable information—training machines to correctly ‘understand’ human language involves the identification of the words people use to communicate via Twitter when faced with a disaster situation.”

Research Design & Results

Just how much situational awareness can be extracted from twitter during a crisis? What constitutes situational awareness in the first place vis-a-vis emergency response? And can the answer to these questions yield a dedicated ontology that can be fed into automated natural language processing platforms to generate real-time, shared awareness? To answer these questions, Sarah analyzed four emergency events: Oklahoma Fires (2009), Red River Floods (2009 & 2010) and the Haiti Earthquake (2010).

She collected tweets generated during each of these emergencies and developed a three-step qualitative coding process to analyze what kinds of information on Twitter contribute to situational awareness during a major emergency. As a first step, each tweet was categorized as either:

O: Off-topic
“Tweets do not contain any information that mentions or relates to the emergency event.”

R: On-topic and Relevant to Situational Awareness
“Tweets contain information that provides tactical, actionable information that can aid people in making decisions, advise others on how to obtain specific information from various sources, or offer immediate post- impact help to those affected by the mass emergency.”

N: On-topic and Not Relevant to Situational Awareness
“Tweets are on-topic because they mention the emergency by including offers of prayer and support in relation to the emergency, solicitations for donations to charities, or casual reference to the emergency event. But these tweets do not meet the above criteria for situational relevance.”

The O, R, and N coding of the crisis datasets resulted in the following statistics for each of the four datasets:

For the second coding step, on-topic relevant tweets were annotated with more specific information based on the following coding rule:

S: Social Environment
“These tweets include information about how people and/or animals are affected by a hazard, questions asked in relation to the hazard, responses to the hazard and actions to take that directly relate to the hazard and the emergency situation it causes. These tweets all include description of a human element in that they explain or display human behavior.”

B: Built Environment
“Tweets that include information about the effect of the hazard on the built environment, including updates on the state of infrastructure, such as road closures or bridge outages, damage to property, lack of damage to property and the overall state or condition of structures.”

P: Physical Environment
“Tweets that contain specific information about the hazard including particular locations of the hazard agent or where the hazard agent is expected or predicted to travel or predicted states of the hazard agent going forward, notes about past hazards that compare to the current hazard, and how weather may affect hazard conditions. These tweets additionally include information about the type of hazard in general […]. This category also subsumes any general information about the area under threat or in the midst of an emergency […].”

The result of this coding for Haiti is depicted in the figures below.

According to the results, the social environment (‘S’) category is most common in each of the datasets. “Disasters are social events; in each disaster studied in this dissertation, the disaster occurred because a natural hazard impacted a large number of people.”

For the third coding step, Sarah created a comprehensive list of several dozen  “Information Types” for each “Environment” using inductive, data-driven analysis of twitter communications, which she combined with findings from the disaster literature and official government procedures for disaster response. In total, Sarah identified 32 specific types of information that contribute to situational awareness. The table below compares the Twitter Information Types for all three environments as related to government procedures, for example.

“Based on the discourse analysis of Twitter communications broadcast during four mass emergency events,” Sarah identified 32 specific types of information that “contribute to situational awareness. Subsequent analysis of the sociology of disaster literature, government documents and additional research on the use of Twitter in mass emergency uncovered three additional types of information.”

In sum, “[t]he comparison of the information types [she] uncovered in [her] analysis of Twitter communications to sociological research on disaster situations, and to governmental procedures, serves as a way to gauge the validity of [her] ground-up, inductive analysis.” Indeed, this enabled Sarah to identify areas of overlap as well as gaps that needed to be filled. The final Information Type framework is listed below:

And here are the results of this coding framework when applied to the Haiti data:

“Across all four datasets, the top three types of information Twitter users communicated comprise between 36.7-52.8% of the entire dataset. This is an indication that though Twitter users communicate about a variety of informa-tion, a large portion of their attention is focused on only a few types of in-formation, which differ across each emergency event. The maximum number of information types communicated during an event is twenty-nine, which was during the Haiti earthquake.”

Natural Language Processing & Findings

The coding described above was all done manually by Sarah and research colleagues. But could the ontology she has developed (Information Types) be used to automatically identify tweets that are both on-topic and relevant for situational awareness? To find out, she carried out a study using VerbNet.

“The goal of identifying verbs used in tweets that convey information relevant to situational awareness is to provide a resource that demonstrates which VerbNet classes indicate information relevant to situational awareness. The VerbNet class information can serve as a linguistic feature that provides a classifier with information to identify tweets that contain situational awareness information. VerbNet classes are useful because the classes provide a list of verbs that may not be present in any of the Twitter data I examined, but which may be used to describe similar information in unseen data. In other words, if a particular VerbNet class is relevant to situational awareness, and a classifier identifies a verb in that class that is used in a previously unseen tweet, then that tweet is more likely to be identified as containing situational awareness information.”

Sarah identified 195 verbs that mapped to her Information Types described earlier. The results of using this verb-based ontology are mixed, however. “A majority of tweets do not contain one of the verbs in the identified VerbNet classes, which indicates that additional features are necessary to classify tweets according to the social, built or physical environment.”

However, when applying the 195 verbs to identify on-topic tweets relevant to situational awareness to previously unused Haiti data, Sarah found that using her customized VerbNet ontology resulted in finding 9% more tweets than when using her “Information Types” ontology. In sum, the results show that “using VerbNet classes as a feature is encouraging, but other features are needed to identify tweets that contain situational awareness information, as not all tweets that contain situational awareness information use one of the verb members in the […] identified VerbNet classes. In addition, more research in this area will involve using the semantic and syntactic information contained in each VerbNet class to identify event participants, which can lead to more fine-grained categorization of tweets.”

Conclusion

“Many tweets that communicate situational awareness information do not contain one of the verbs in the identified VerbNet classes, [but] the information provided with named entities and semantic roles can serve as features that classifiers can use to identify situational awareness information in the absence of such a verb. In addition, for tweets correctly identified as containing information relevant to situational awareness, named entities and semantic roles can provide classifiers with additional information to classify these tweets into the social, built and physical environment categories, and into specific information type categories.”

“Finding the best approach toward the automatic identification of situational awareness information communicated in tweets is a task that will involve further training and testing of classifiers.”

Crowdsourcing for Human Rights Monitoring: Challenges and Opportunities for Information Collection & Verification

This new book, Human Rights and Information Communication Technologies: Trends and Consequences of Use, promises to be a valuable resource to both practitioners and academics interested in leveraging new information & communication technologies (ICTs) in the context of human rights work. I had the distinct pleasure of co-authoring a chapter for this book with my good colleague and friend Jessica Heinzelman. We focused specifically on the use of crowdsourcing and ICTs for information collection and verification. Below is the Abstract & Introduction for our chapter.

Abstract

Accurate information is a foundational element of human rights work. Collecting and presenting factual evidence of violations is critical to the success of advocacy activities and the reputation of organizations reporting on abuses. To ensure credibility, human rights monitoring has historically been conducted through highly controlled organizational structures that face mounting challenges in terms of capacity, cost and access. The proliferation of Information and Communication Technologies (ICTs) provide new opportunities to overcome some of these challenges through crowdsourcing. At the same time, however, crowdsourcing raises new challenges of verification and information overload that have made human rights professionals skeptical of their utility. This chapter explores whether the efficiencies gained through an open call for monitoring and reporting abuses provides a net gain for human rights monitoring and analyzes the opportunities and challenges that new and traditional methods pose for verifying crowdsourced human rights reporting.

Introduction

Accurate information is a foundational element of human rights work. Collecting and presenting factual evidence of violations is critical to the success of advocacy activities and the reputation of organizations reporting on abuses. To ensure credibility, human rights monitoring has historically been conducted through highly controlled organizational structures that face mounting challenges in terms of capacity, cost and access.

The proliferation of Information and Communication Technologies (ICTs) may provide new opportunities to overcome some of these challenges. For example, ICTs make it easier to engage large networks of unofficial volunteer monitors to crowdsource the monitoring of human rights abuses. Jeff Howe coined the term “crowdsourcing” in 2006, defining it as “the act of taking a job traditionally performed by a designated agent and outsourcing it to an undefined, generally large group of people in the form of an open call” (Howe, 2009). Applying this concept to human rights monitoring, Molly Land (2009) asserts that, “given the limited resources available to fund human rights advocacy…amateur involvement in human rights activities has the potential to have a significant impact on the field” (p. 2). That said, she warns that professionalization in human rights monitoring “has arisen not because of an inherent desire to control the process, but rather as a practical response to the demands of reporting – namely, the need to ensure the accuracy of the information contained in the report” (Land, 2009, p. 3).

Because “accuracy is the human rights monitor’s ultimate weapon” and the advocate’s “ability to influence governments and public opinion is based on the accuracy of their information,” the risk of inaccurate information may trump any advantages gained through crowdsourcing (Codesria & Amnesty International, 2000, p. 32). To this end, the question facing human rights organizations that wish to leverage the power of the crowd is “whether [crowdsourced reports] can accomplish the same [accurate] result without a centralized hierarchy” (Land, 2009). The answer to this question depends on whether reliable verification techniques exist so organizations can use crowdsourced information in a way that does not jeopardize their credibility or compromise established standards. While many human rights practitioners (and indeed humanitarians) still seem to be allergic to the term crowdsourcing, further investigation reveals that established human rights organizations already use crowdsourcing and verification techniques to validate crowdsourced information and that there is great potential in the field for new methods of information collection and verification.

This chapter analyzes the opportunities and challenges that new and traditional methods pose for verifying crowdsourced human rights reporting. The first section reviews current methods for verification in human rights monitoring. The second section outlines existing methods used to collect and validate crowdsourced human rights information. Section three explores the practical opportunities that crowdsourcing offers relative to traditional methods. The fourth section outlines critiques and solutions for crowdsourcing reliable information. The final section proposes areas for future research.

The book is available for purchase here. Warning: you won’t like the price but at least they’re taking an iTunes approach, allowing readers to purchase single chapters if they prefer. Either way, Jess and I were not paid for our contribution.

For more information on how to verify crowdsourced information, please visit the following links:

  • Information Forensics: Five Case Studies on How to Verify Crowdsourced Information from Social Media (Link)
  • How to Verify and Counter Rumors in Social Media (Link)
  • Social Media and Life Cycle of Rumors during Crises (Link)
  • Truthiness as Probability: Moving Beyond the True or False Dichotomy when Verifying Social Media (Link)
  • Crowdsourcing Versus Putin (Link)
 

 

PeopleBrowsr: Next-Generation Social Media Analysis for Humanitarian Response?

As noted in this blog post on “Data Philanthropy for Humanitarian Response,” members of the Digital Humanitarian Network (DHNetwork) are still using manual methods for media monitoring. When the United Nations Office for the Coordination of Humanitarian Affairs (OCHA) activated the Standby Volunteer Task Force (SBTF) to crisis map Libya last year, for example, SBTF volunteers manually monitored hundreds of Twitter handles, news sites for several weeks.

SBTF volunteers (Mapsters) do not have access to a smart microtasking platform that could have distributed the task in more efficient ways. Nor do they have access to even semi-automated tools for content monitoring and information retrieval. Instead, they used a Google Spreadsheet to list the sources they were manually monitoring and turned this spreadsheet into a sign-up sheet where each Mapster could sign on for 3-hour shifts every day. The SBTF is basically doing “crowd computing” using the equivalent of a typewriter.

Meanwhile, companies like Crimson Hexagon, NetBase, RecordedFuture and several others have each developed sophisticated ways to monitor social and/or mainstream media for various private sector applications such as monitoring brand perception. So my colleague Nazila kindly introduced me to her colleagues at PeopleBrowsr after reading my post on Data Philanthropy. Last week, Marc from PeopleBrowsr gave me a thorough tour of the platform. I was definitely impressed and am excited that Marc wants us to pilot the platform in support of the Digital Humanitarian Network. So what’s the big deal about PeopleBrowsr? To begin with, the platform has access to 1,000 days of social media data and over 3 terabytes of social data per month.

To put this in terms of information velocity, PeopleBrowsr receives 10,000 social media posts per second from a variety of sources including Twitter, Facebook, fora and blogs. On the latter, they monitor posts from over 40 million blogs including all of Tumblr, Posterious, Blogspot and every WordPress-hosted site. They also pull in content from YouTube and Flickr. (Click on the screenshots below to magnify them).

Lets search for the term “tsunami” on Twitter. (One could enter a complex query, e.g., and/or, not, etc., and also search using twitter handles, word or hashtag clouds, top URLs, videos, pictures, etc). PeopleBrowsr summarizes the result by Location and Community. Location simply refers to where those generating content referring to a tsunami are located. Of course, many Twitter users may tweet about an event without actually being eye-witness accounts (think of Diaspora groups, for example). While PeopleBrowsr doesn’t geo-tag the location of reports events, you can very easily and quickly identify which twitter users are tweeting the most about a given event and where they are located.

As for Community, PeopleBrowsr has  indexed millions of social media users and clustered them into different communities based on their profile/bio information. Given our interest in humanitarian response, we could create our own community of social media users from the humanitarian sector and limit our search to those users only. Communities can also be created based on hashtags. The result of the “tsunami” search is displayed below.

This result can be filtered further by gender, sentiment, number of twitter followers, urgent words (e.g., alert, help, asap), time period and location, for example. The platform can monitor and view posts in any language that is posted. In addition, PeopleBrowsr have their very own Kred score which quantifies the “credibility” of social media users. The scoring metrics for Kred scores is completely transparent and also community driven. “Kred is a transparent way to measure influence and outreach in social media. Kred generates unique scores for every domain of expertise. Regardless of follower count, a person is influential if their community is actively listening and engaging with their content.”

Using Kred, PeopleBrows can do influence analysis using Twitter across all languages. They’ve also added Facebook to Kred, but only as an opt in option.  PeopleBrowsr also has some great built-in and interactive data analytics tools. In addition, one can download a situation report in PDF and print that off if there’s a need to go offline.

What appeals to me the most is perhaps the full “drill-down” functionality of PeopleBrowsr’s data analytics tools. For example, I can drill down to the number of tweets per month that reference the word “tsunami” and drill down further per week and per day.

Moreover, I can sort through the individual tweets themselves based on specific filters and even access the underlying tweets complete with twitter handles, time-stamps, Kred scores, etc.

This latter feature would make it possible for the SBTF to copy & paste and map individual tweets on a live crisis map. In fact, the underlying data can be downloaded into a CSV file and added to a Google Spreadsheet for Mapsters to curate. Hopefully the Ushahidi team will also provide an option to upload CSVs to SwiftRiver so users can curate/filter pre-existing datasets as well as content generated live. What if you don’t have time to get on PeopleBrowsr and filter, download, etc? As part of their customer support, PeopleBrowsr will simply provide the data to you directly.

So what’s next? Marc and I are taking the following steps: Schedule online demo of PeopleBrowsr of the SBTF Core Team (they are for now the only members of the Digital Humanitarian Network with a dedicated and experienced Media Monitoring Team); SBTF pilots PeopleBrowsr for preparedness purposes; SBTF deploys  PeopleBrowsr during 2-3 official activations of the Digital Humanitarian Network; SBTF analyzes the added value of PeopleBrowsr for humanitarian response and provides expert feedback to PeopleBrowsr on how to improve the tool for humanitarian response.

DeadUshahidi: Neither Dead Right Nor Dead Wrong

There’s a new Crowdmap in town called DeadUshahidi. The site argues that “Mapping doesn’t equal change. Using crowdsourcing tech like Ushahidi maps without laying the strategic and programmatic ground work is likely not going to work. And while we think great work has been done with crowdsourced reporting, there is an increasing number of maps that are set up with little thought as to why, who should care, and how the map leads to any changes.”

In some ways this project is stating the obvious, but the obvious sometimes needs repeating. As Ushahidi’s former Executive Director Ory Okolloh warned over two years ago: “Don’t get too jazzed up! Ushahidi is only 10% of solution.” My own doctoral research, which included a comparative analysis of Ushahidi’s use in Egypt and the Sudan, demonstrated that training, preparedness, outreach and strategic partnerships were instrumental. So I do appreciate DeadUshahidi’s constructive (and entertaining!) efforts to call attention to this issue and explain what makes a good crowd-sourced map.

At the same time, I think some of the assumptions behind this initiative need questioning. According to the project, maps with at least one of the following characteristics is added to the cemetery:

  • No one has submitted a report to your map in the last 12 months.
  • For time-bound events, like elections and disasters, the number of reports are so infinitesimally small (in relation to the number of the community the map is targeting) that the map never reached a point anywhere near relevance. (Our measure for elections is, for instance, # of submissions / # of registered voters > .0001).
  • The map was never actually started (no category descriptions, fewer than 10 reports). We call that a stillbirth.

Mapping doesn’t equal change, but why assume that every single digital map is launched to create change? Is every blog post written to create change? Is every Wikipedia article edit made to effect change? Every tweet? What was the impact of the last hard copy map you saw? Intention matters and impact cannot be measured without knowing the initial motivations behind a digital map, the intended theory of change and some kind of baseline to measure said change. Also, many digital maps are event-based and thus used for a limited period of time only. They may no longer receive new reports a year after the launch, but this doesn’t make it a “dead” map, simply a completed project. A few may even deserve to go to map heaven—how about a UshahidiHeaven crowdmap?

I’m also not entirely convinced by the argument that the number of reports per map has to cross a certain threshold for the crowdsourced map to be successful. A digital map of a neighborhood in Sydney with fewer than one hundred reports could very well have achieved the intended goal of the project. So again, without knowing or being able to reliably discern the motivations behind a digital map, it is rather farfetched to believe that one can assess whether a project was success-ful or not. Maybe most of the maps in the DeadUshahidi cemetery were never meant to live beyond a few days, weeks or months in the first place.

That said, I do think that one of the main challenges with Ushahidi/Crowdmap use is that the average number of reports per map is very, very low. Indeed, the vast majority of Crowdmaps are stillborn as a forthcoming study from Internews shows. Perhaps this long-tail effect shouldn’t be a surprise though. The costs of experimenting are zero and the easier the technology gets, the more flowers will bloom—or rather the more seeds become available. Whether these free and open source seeds actually get planted and grow into flowers (let alone lush eco-systems) is another issue and one dependent on a myriad of factors such as the experience of the “gardener”, the quality of the seeds, the timing and season, the conditions of the soil and climate, and the availability of other tools used for planting and cultivation.

Or perhaps a better analogy is photography. Thanks to Digital Cameras, we take zillions more pictures than we did just 5 years ago because each click is virtually free. We’re no longer limited to 24 or 36 pictures per roll of film, which first required one to buy said roll and later to pay for it again to be developed. As a result of digital cameras, one could argue that there are now a lot more bad quality (dead) pictures being uploaded everywhere. So what? Big deal. There is also more excellent amateur photography out there as well. What about other technologies and media? There are countless of “dead” Twitter accounts, WordPress blogs, Ning platforms, customized Google Maps, etc. Again, so what?

Neogeography is about democratizing map-making and user-generated maps. Naturally, there’s going to be learning and experimentation involved. So my blog post is not written in defense of Ushahidi/Crowdmap but rather in defense of all amateur digital mappers out there who are curious and just want to map whatever the heck they well please. In sum, and to return to the gardening analogy if I may, the more important question here is why the majority of (Usha)seeds aren’t planted or don’t grow, and what can be done about this in a pro-active manner. Is there something wrong with the seed? Do would-be gardeners simply need more gardening manuals? Or do they need more agile micro-tasking and data-mining tools? The upcoming Internews report goes a long way to explaining the why & what and TechChange’s course on Ushahidi may be one way to save some future maps from ending up in the DeadUshahidi cemetery prematurely.

Evolution in Live Mapping: The 2012 Egyptian Presidential Elections

My doctoral dissertation compared the use of live mapping technology in Egypt and the Sudan during 2010. That year was the first time that Ushahidi was deployed in those two countries. So it is particularly interesting to see the technology used again in both countries in 2012. Sudanese activists are currently using the platform to map #SudanRevolts while Egyptian colleagues have just used the tool to monitor the recent elections in their country.

Analyzing the evolution of live mapping technology use in non-permissive environments ought to make for a very interesting piece of research (any takers?). In the case of Egypt, one could compare the use of the same technology and methods before and after the fall of Mubarak. In 2010, the project was called U-Shahid. This year, the initiative was branded as the “Egypt Elections Project.”

According to my colleagues in Cairo who managed the interactive map, “more than 15 trainers and 75 coordinators were trained to work in the ‘operation room’ supporting 2200 trained observers scattered all over Egypt. More than 17,000 reports, up to 25000 short messages were sent by the observers and shown on Ushahid’s interactive map. Although most reports received shown a minimum amount of serious violations, and most of them were indicating the success of the electoral process, our biggest joy was being able to monitor freely and to report the whole process with full transparency.”

Contrast this situation with how Egyptian activists struggled to keep their Ushahidi project alive under Mubarak in 2010. Last week, the team behind the current live map was actually interviewed by state television (picture above), which was formerly controlled by the old regime. Interestingly, the actual map is no longer the centerpiece of the project when compared to the U-Shahid deploy-ment. The team has included and integrated a lot more rich multimedia content in addition to data, statistics and trends analysis. Moreover, there appears to be a shift towards bounded crowdsourcing rather than open crowd-sourcing as far as election mapping projects go.

These two live mapping projects in Egypt and the Sudan are also getting relatively more traction than those in 2010. Some 17,000 reports were mapped in this year’s election project compared to 2,700 two years ago. Apparently, “millions of users logged into the [Egypt Project Elections] site to check the outcome of the electoral process,” compared to some 40,000 two years ago. Sudanese activists in Khartoum also appear to be far better organized and more agile at leverage social media channels to garner support for their movement than in 2010. Perhaps some of the hard lessons from those resistance efforts were learned.

This learning factor is key and relates to an earlier blog post I wrote on “Technology and Learning, Or Why the Wright Brothers Did Not Create the 747.” Question is: do repressive regimes learn faster or do social movements operate with more agile feedback loops? Indeed, perhaps the technology variable doesn’t matter the most. As I explained to Newsweek a while back, “It is the organiza-tional structure that will matter the most. Rigid structures are unable to adapt as quickly to a rapidly changing environment as a decentralized system. Ultimately, it is a battle of organizational theory.” In the case of Egypt and Sudan today, there’s no doubt that activists in both countries are better organized while the technologies themselves haven’t actually changed much since 2010. But better organization is a necessary, not sufficient, condition to catalyze positive social change and indirect forms of democracy.

Pierre Rosanvallon (2008) indentifies three channels whereby civil society can hold the state accountable during (and in between) elections, and independent of their results.

“The first refers to the various means whereby citizens (or, more accurately, organizations of citizens) are able to monitor and publicize the behavior of elected and appointed rulers; the second to their capacity to mobilize resistance to specific policies, either before or after they have been selected; the third to the trend toward ‘juridification’ of politics when individuals or social groups use the courts and, especially, jury trials to bring delinquent politicians to judgment.”

Live maps and crowdsourcing can be used to monitor and publicize the behavior of politicians. The capacity to mobilize resistance and bring officials to judgment may require a different set of strategies and technologies, however. Those who don’t realize this often leave behind a cemetery of dead maps.

Muḥammad ibn Mūsā al-Khwārizmī: An Update from the Qatar Computing Research Institute

I first heard of al-Khwārizmī in my ninth-grade computer science class at the International School of Vienna (AIS) back in 1993. Dr. Herman Prossinger who taught the course is exactly the kind of person one describes when answering the question: which teacher had the most impact on you while growing up? I wonder how many other 9th graders in the world had the good fortune of being taught computer science by a full-fledged professor with a PhD dissertation entitled “Isothermal Gas spheres in General Relativity Theory” (1976) and numerous peer-reviewed publications in top-tier scientific journals including Nature?

Muḥammad ibn Mūsā al-Khwārizmī was a brilliant mathematician & astronomer who spent his time as a scholar in the House of Wisdom in Baghdad (possibly the best name of any co-working space in history). “Al-Khwarithmi” was initially transliterated into Latin as Algoritmi. The manuscript above, for example, begins with “DIXIT algorizmi,” meaning “Says al-Khwārizmī.” And thus was born the world AlgorithmBut al-Khwārizmī’s fundamental contributions were not limited to the fields of mathematics and astronomy, he is also well praised for his important work on geography and cartography. Published in 833, his Kitāb ṣūrat al-Arḍ (Arabic: كتاب صورة الأرض) or “Book on the Appearance of the Earth” was a revised and corrected version of Ptolemy’s Geography. al-Khwārizmī’s book comprised an impressive list of 2,402 coordinates of cities and other geo-graphical features. The only surviving copy of the book can be found at Strasbourg University. I’m surprised the item has not yet been purchased by Qatar and relocated to Doha.

View of the bay from QCRI in Doha, Qatar.

This brings me to the Qatar (Foundation) Computing Research Institute (QCRI), which was almost called the al-Khwārizmī Computing Research Institute. I joined QCRI exactly two weeks ago as Director of Social Innovation. My first impression? QCRI is Doha’s “House of Whizzkids”. The team is young, dynamic, international and super smart. I’m already working on several exploratory research and development (R&D) projects that could potentially lead to initial prototypes by the end of the year. These have to do with the application of social computing and big data analysis for humanitarian response. So I’ve been in touch with several colleagues at the United Nations (UN) Office for the Coordination of Humanitarian Affairs (OCHA) to bounce these early ideas off and am thrilled that all responses thus far have been very positive.

My QCRI colleagues and I are also looking into collaborative platforms for “smart microtasking” which may be useful for the Digital Humanitarian Network. In addition, we’re just starting to explore potential solutions for quantifying veracity in social media, a rather non-trivial problem as Dr. Prossinger would often say with a sly smile in relation to NP-hard problems. In terms of partner-ship building, I will be in New York, DC and Boston next month for official meetings with the UN, World Bank and MIT to explore possible collaborations on specific projects. The team in Doha is particularly strong on big data analytics, social computing, data cleaning, machine learning and translation. In fact, most of the whizzkids here come from very impressive track records with Microsoft, Yahoo, Ivy Leagues, etc. So I’m excited by the potential.

View of Tornado Tower (purple lights) where QCRI is located.

The reason I’m not going into specifics vis-a-vis these early R&D efforts is not because I want to be secretive or elusive. Not at all. We’re still refining the ideas ourselves and simply want to manage expectations. There is a very strong and genuine interest within QCRI to contribute meaningfully to the humanitarian technology space. But we’re really just getting started, still hiring left, center and right, and we’ll be in R&D mode for a while. Plus, we don’t want to rush just for the sake of launching a new product. All too often, humanitarian technologies are developed without the benefit (and luxury) of advanced R&D. But if QCRI is going to help shape next-generation humanitarian technology solutions, we should do this in a way that is deliberate, cutting-edge and strategic. That is our comparative advantage.

In sum, the outcome of our R&D efforts may not always lead to a full-fledged prototype, but all the research and findings we carry out will definitely be shared publicly so we can move the field forward. We’re also committed to developing free and open source software as part of our prototyping efforts. Finally, we have no interest in re-inventing the wheel and far prefer working in partnerships than in isolation. So there we go, time to R&D  like al-Khwārizmī.

Crisis Mapping the End of Sudan’s Dictatorship?

Anyone following the twitter hashtag #SudanRevolts in recent days must be stunned by the shocking lack of coverage in the mainstream media. The protests have been escalating since June 17 when female students at the University of Khartoum began demonstrating against the regime’s austerity measures, which are increasing the prices of basic commodities and removing fuel subsidies. The dissent has quickly spread to other universities and communities.

There’s no doubt that Sudan’s dictator is in trouble. He faces international economic sanctions and a mounting US$2.5 billion budget deficit following the secession of South Sudan last year. What’s more, he is also “fighting expensive, devastating, and unpopular wars in Darfur (in the west), Blue Nile, Southern Kordofan, and the Nuba Mountains (on the border with South Sudan)” (UN Dispatch). So what next?

Enter Sudan Change Now, a Sudanese political movement with a clear mandate: peaceful but total democratic change. They seek to “defeat the present power of darkness using all necessary tools of peace resistance to achieve political stability and social peace.” The movement is thus “working on creating a common front that incorporates all victims of the current regime to ensure a unified and effective course of action to overthrow it.” Here are some important videos they have captured of the protests.

According to GlobalVoices, “The Sudanese online community believe that media coverage was an integral part of the revolutions in Egypt and Tunisia, and are therefore demanding the same for Sudan.” The political movement Sudan Change Now is thus turning to crisis mapping to cast more light on the civil resistance efforts in the Sudan:

https://sudanchangenow2012.crowdmap.com

The crisis map includes over 50 individual reports (all added in the past 24 hours) ranging from female protestors confronting armed guards to Sudanese security forces using tear gas to break up demonstrations. There are also reports of detained activists and journalists. These reports come from twitter while more recent incidents are sourced from the little mainstream media coverage that currently exists. The live map is being updated several times a day.

As my colleague Carol Gallo reminds us, “The University of Khartoum was also the birthplace of the movement that led to the overthrow of the military government in 1964.” Symbols and anniversaries are important features of civil resistance. For example, Sudan’s current ruling party came to power on June 30th, 1989. So protestors including those with Sudan Change Now are gearing up for some major demonstrations this Wednesday.

This is not the first crisis map of protests in Khartoum. In January 2011, activists launched this crisis map. I hope that protestors engaged in current civil resistance efforts take note of the lessons learned from last year’s #Jan30 demonstrations. For my doctoral dissertation, I compared the use of crisis maps by Egyptian and Sudanese activists in 2010. If I had to boil down the findings into three key words, these would be: unity, preparedness, creativity.

Unity is absolutely instrumental in civil resistance. As for preparedness, nothing should be left to chance. Prepare and plan the sequence of civil resistance efforts (along with likely reactions) and remember that protests come at the end. The ground-work must first be laid with other civil resistance tactics and thence escalated. Finally, creativity is essential, so here are some tactics that may provide some ideas. They include both traditional tactics and technology-enabled ones like digital crisis maps.

NB: I understand that the security risks of using the Ushahidi mapping platform have been indirectly communicated to the activists.

Back to the Future: On National Geographic and Crisis Mapping

[Cross-posted from National Geographic Newswatch]

Published in October 1888, the first issue of National Geographic “was a modest looking scientific brochure with an austere terra-cotta cover” (NG 2003). The inaugural publication comprised a dense academic treatise on the classification of geographic forms by genesis. But that wasn’t all. The first issue also included a riveting account of “The Great White Hurricane” of March 1888, which still ranks as one of the worst winter storms ever in US history.

Wreck at Coleman’s Station, New York & Harlem R. R., March 13, 1888. Photo courtesy NOAA Photo Library.

I’ve just spent a riveting week myself at the 2012 National Geographic Explorers Symposium in Washington DC, the birthplace of the National Geographic Society. I was truly honored to be recognized as a 2012 Emerging Explorer along with such an amazing and accomplished cadre of explorers. So it was with excitement that I began reading up on the history of this unique institution whilst on my flight to Doha following the Symposium.

I’ve been tagged as the “Crisis Mapper” of the Emerging Explorers Class of 2012. So imagine my astonishment when I  began discovering that National Geographic had a long history of covering and mapping natural disasters, humanitarian crises and wars starting from the very first issue of the magazine in 1888. And when World War I broke out:

“Readers opened their August 1914 edition of the magazine to find an up-to-date map of ‘The New Balkan States and Central Europe’ that allowed them to follow the developments of the war. Large maps of the fighting fronts continued to be published throughout the conflict […]” (NG 2003).

Map of ‘The New Balkan States and Central Europe’ from the August 1914 “National Geographic Magazine.” Image courtesy NGS.

National Geographic even established a News Service Bureau to provide bulletins on the geographic aspects of the war for the nation’s newspapers. As the respected war strategist Carl von Clausewitz noted half-a-century before the launch of Geographic, “geography and the character of the ground bear a close and ever present relation to warfare, . . . both as to its course and to its planning and exploitation.”

“When World War II came, the Geographic opened its vast files of photographs, more than 300,000 at that time, to the armed forces. By matching prewar aerial photographs against wartime ones, analysts detected camouflage and gathered intelligence” (NG 2003).

During the 1960s, National Geographic “did not shrink from covering the war in Vietnam.” Staff writers and photographers captured all aspects of the war from “Saigon to the Mekong Delta to villages and rice fields.” In the years and decades that followed, Geographic continued to capture unfolding crises, from occupied Palestine and Apartheid South Africa to war-torn Afghanistan and the drought-striven Sahel of Africa.

Geographic also covered the tragedy of the Chernobyl nuclear disaster and the dramatic eruption of Mount Saint Helens. The gripping account of the latter would in fact become the most popular article in all of National Geographic history. Today,

“New technologies–remote sensing, lasers, computer graphics, x-rays and CT scans–allow National Geographic to picture the world in new ways.” This is equally true of maps. “Since the first map was published in the magazine in 1888, maps  have been an integral component of many magazine articles, books and television programs […]. Originally drafted by hand on large projections, today’s maps are created by state-of-the art computers to map everything from the Grand Canyon to the outer reaches of the universe” (NG 2003). And crises.

“Pick up a newspaper and every single day you’ll see how geography plays a dominant role in giving a third dimension to life,” wrote Gil Grosvenor, the former Editor in Chief of National Geographic (NG 2003). And as we know only too well, many of the headlines in today’s newspapers relay stories of crises the world over. National Geographic has a tremendous opportunity to shed a third dimension on emerging crises around the globe using new live mapping technologies. Indeed, to map the world is to know it, and to map the world live is to change it live before it’s too late. The next post in this series will illustrate why with an example from the 2010 Haiti Earthquake.

Patrick Meier is a 2012 National Geographic Emerging ExplorerHe is an internationally recognized thought leader on the application of new technologies for positive social change. He currently serves as Director of Social Innovation at the Qatar Foundation’s Computing Research Institute (QCRI). Patrick also authors the respected iRevolution blog & tweets at @patrickmeier. This piece was originally published here on National Geographic.

How Can Innovative Technology Make Conflict Prevention More Effective?

I’ve been asked to participate in an expert working group in support of a research project launched by the International Peace Institute (IPI) on new technologies for conflict prevention. Both UNDP and USAID are also partners in this effort. To this end, I’ve been invited to make some introductory remarks during our upcoming working group meeting. The purpose of this blog post is to share my preliminary thoughts on this research and provide some initial suggestions.

Before I launch into said thoughts, some context may be in order. I spent several years studying, launching and improving conflict early warning systems for violence prevention. While I haven’t recently blogged about conflict prevention on iRevolution, you’ll find my writings on this topic posted on my other blog, Conflict Early Warning. I have also published and presented several papers on conflict prevention, most of which are available here. The most relevant ones include the following:

  • Meier, Patrick. 2011. Early Warning Systems and the Prevention of Violent Conflict. In Peacebuilding in the Information Age: Sifting Hype from Reality, ed. Daniel Stauffacher et al. GenevaICT4Peace. Available online.
  • Leaning, Jennifer and Patrick Meier. 2009. “The Untapped Potential of Information Communication Technology for Conflict Early Warning and Crisis Mapping,” Working Paper Series, Harvard Humanitarian Initiative (HHI), Harvard University. Available online.
  • Leaning, Jennifer and Patrick Meier. 2008. “Community Based Conflict Early Warning and Response Systems: Opportunities and Challenges.” Working Paper Series, Harvard Humanitarian Initiative (HHI), Harvard University. Available online.
  • Leaning, Jennifer and Patrick Meier. 2008. “Conflict Early Warning and Response: A Critical Reassessment.” Working Paper Series, Harvard Humanitarian Initiative (HHI), Harvard University. Available online.
  • Meier, Patrick. 2008. “Upgrading the Role of Information Communication Technology (ICT) for Tactical Early Warning/Response.” Paper prepared for the 49th Annual Convention of the International Studies Association (ISA) in San Francisco. Available online.
  • Meier, Patrick. 2007. “New Strategies for Effective Early Response: Insights from Complexity Science.” Paper prepared for the 48th Annual Convention of the International Studies Association (ISA) in Chicago.Available online.
  • Campbell, Susanna and Patrick Meier. 2007. “Deciding to Prevent Violent Conflict: Early Warning and Decision-Making at the United Nations.” Paper prepared for the 48th Annual Convention of the International Studies Association (ISA) in Chicago. Available online.
  • Meier, Patrick. 2007. From Disaster to Conflict Early Warning: A People-Centred Approach. Monday Developments 25, no. 4, 12-14. Available online.
  • Meier, Patrick. 2006. “Early Warning for Cowardly Lions: Response in Disaster & Conflict Early Warning Systems.” Unpublished academic paper, The Fletcher SchoolAvailable online.
  • I was also invited to be an official reviewer of this 100+ page workshop summary on “Communication and Technology for Violence Prevention” (PDF), which was just published by the National Academy of Sciences. In addition, I was an official referee for this important OECD report on “Preventing Violence, War and State Collapse: The Future of Conflict Early Warning and Response.”

An obvious first step for IPI’s research would be to identify the conceptual touch-points between the individual functions or components of conflict early warning systems and information & communication technology (ICT). Using this concep-tual framework put forward by ISDR would be a good place to start:

That said, colleagues at IPI should take care not to fall prey to technological determinism. The first order of business should be to understand exactly why previous (and existing) conflict early warning systems are complete failures—a topic I have written extensively about and been particularly vocal on since 2004. Throwing innovative technology at failed systems will not turn them into successful operations. Furthermore, IPI should also take note of the relatively new discourse on people-centered approaches to early warning and distinguish between first, second, third and fourth generation conflict early warning systems.

On this note, IPI ought to focus in particular on third and fourth generation systems vis-a-vis the role of innovative technology. Why? Because first and second generation systems are structured for failure due to constraints explained by organizational theory. They should thus explore the critical importance of conflict preparedness and the role that technology can play in this respect since preparedness is key to the success of third and fourth generation systems. In addition, IPI should consider the implications of crowdsourcing, crisis mapping, Big Data, satellite imagery and the impact that social media analytics might play for the early detection and respons to violent conflict. They should also take care not to ignore critical insights from the field of nonviolent civil resistance vis-a-vis preparedness and tactical approaches to community-based early response. Finally, they should take note of new and experimental initiatives in this space, such as PeaceTXT.

IPI’s plans to write up several case studies on conflict early warning systems to understand how innovative technology might (or already are) making these more effective. I would recommend focusing on specific systems in Kenya, Kyrgyzstan Sri Lanka and Timor-Leste. Note that some community-based systems are too sensitive to make public, such as one in Burma for example. In terms of additional experts worth consulting, I would recommend David Nyheim, Joe Bock, Maria Stephan, Sanjana Hattotuwa, Scott Edwards and Casey Barrs. I would also shy away from inviting too many academics or technology companies. The former tend to focus too much on theory while the latter often have a singular focus on technology.

Many thanks to UNDP for including me in the team of experts. I look forward to the first working group meeting and reviewing IPI’s early drafts. In the meantime, if iRevolution readers have certain examples or questions they’d like me to relay to the working group, please do let me know via the comments section below and I’ll be sure to share.

Big Data for Development: Challenges and Opportunities

The UN Global Pulse report on Big Data for Development ought to be required reading for anyone interested in humanitarian applications of Big Data. The purpose of this post is not to summarize this excellent 50-page document but to relay the most important insights contained therein. In addition, I question the motivation behind the unbalanced commentary on Haiti, which is my only major criticism of this otherwise authoritative report.

Real-time “does not always mean occurring immediately. Rather, “real-time” can be understood as information which is produced and made available in a relatively short and relevant period of time, and information which is made available within a timeframe that allows action to be taken in response i.e. creating a feedback loop. Importantly, it is the intrinsic time dimensionality of the data, and that of the feedback loop that jointly define its characteristic as real-time. (One could also add that the real-time nature of the data is ultimately contingent on the analysis being conducted in real-time, and by extension, where action is required, used in real-time).”

Data privacy “is the most sensitive issue, with conceptual, legal, and technological implications.” To be sure, “because privacy is a pillar of democracy, we must remain alert to the possibility that it might be compromised by the rise of new technologies, and put in place all necessary safeguards.” Privacy is defined by the International Telecommunications Union as theright of individuals to control or influence what information related to them may be disclosed.” Moving forward, “these concerns must nurture and shape on-going debates around data privacy in the digital age in a constructive manner in order to devise strong principles and strict rules—backed by adequate tools and systems—to ensure “privacy-preserving analysis.”

Non-representative data is often dismissed outright since findings based on such data cannot be generalized beyond that sample. “But while findings based on non-representative datasets need to be treated with caution, they are not valueless […].” Indeed, while the “sampling selection bias can clearly be a challenge, especially in regions or communities where technological penetration is low […],  this does not mean that the data has no value. For one, data from “non-representative” samples (such as mobile phone users) provide representative information about the sample itself—and do so in close to real time and on a potentially large and growing scale, such that the challenge will become less and less salient as technology spreads across and within developing countries.”

Perceptions rather than reality is what social media captures. Moreover, these perceptions can also be wrong. But only those individuals “who wrongfully assume that the data is an accurate picture of reality can be deceived. Furthermore, there are instances where wrong perceptions are precisely what is desirable to monitor because they might determine collective behaviors in ways that can have catastrophic effects.” In other words, “perceptions can also shape reality. Detecting and understanding perceptions quickly can help change outcomes.”

False data and hoaxes are part and parcel of user-generated content. While the challenges around reliability and verifiability are real, Some media organizations, such as the BBC, stand by the utility of citizen reporting of current events: “there are many brave people out there, and some of them are prolific bloggers and Tweeters. We should not ignore the real ones because we were fooled by a fake one.” And have thus devised internal strategies to confirm the veracity of the information they receive and chose to report, offering an example of what can be done to mitigate the challenge of false information.” See for example my 20-page study on how to verify crowdsourced social media data, a field I refer to as information forensics. In any event, “whether false negatives are more or less problematic than false positives depends on what is being monitored, and why it is being monitored.”

“The United States Geological Survey (USGS) has developed a system that monitors Twitter for significant spikes in the volume of messages about earthquakes,” and as it turns out, 90% of user-generated reports that trigger an alert have turned out to be valid. “Similarly, a recent retrospective analysis of the 2010 cholera outbreak in Haiti conducted by researchers at Harvard Medical School and Children’s Hospital Boston demonstrated that mining Twitter and online news reports could have provided health officials a highly accurate indication of the actual spread of the disease with two weeks lead time.”

This leads to the other Haiti example raised in the report, namely the finding that SMS data was correlated with building damage. Please see my previous blog posts here and here for context. What the authors seem to overlook is that Benetech apparently did not submit their counter-findings for independent peer-review whereas the team at the European Commission’s Joint Research Center did—and the latter passed the peer-review process. Peer-review is how rigorous scientific work is validated. The fact that Benetech never submitted their blog post for peer-review is actually quite telling.

In sum, while this Big Data report is otherwise strong and balanced, I am really surprised that they cite a blog post as “evidence” while completely ignoring the JRC’s peer-reviewed scientific paper published in the Journal of the European Geosciences Union. Until counter-findings are submitted for peer review, the JRC’s results stand: unverified, non-representative crowd-sourced text messages from the disaster affected population in Port-au-Prince that were in turn translated from Haitian Creole to English via a novel crowdsourced volunteer effort and subsequently geo-referenced by hundreds of volunteers  which did not undergo any quality control, produced a statistically significant, positive correlation with building damage.

In conclusion, “any challenge with utilizing Big Data sources of information cannot be assessed divorced from the intended use of the information. These new, digital data sources may not be the best suited to conduct airtight scientific analysis, but they have a huge potential for a whole range of other applications that can greatly affect development outcomes.”

One such application is disaster response. Earlier this year, FEMA Administrator Craig Fugate, gave a superb presentation on “Real Time Awareness” in which he relayed an example of how he and his team used Big Data (twitter) during a series of devastating tornadoes in 2011:

“Mr. Fugate proposed dispatching relief supplies to the long list of locations immediately and received pushback from his team who were concerned that they did not yet have an accurate estimate of the level of damage. His challenge was to get the staff to understand that the priority should be one of changing outcomes, and thus even if half of the supplies dispatched were never used and sent back later, there would be no chance of reaching communities in need if they were in fact suffering tornado damage already, without getting trucks out immediately. He explained, “if you’re waiting to react to the aftermath of an event until you have a formal assessment, you’re going to lose 12-to-24 hours…Perhaps we shouldn’t be waiting for that. Perhaps we should make the assumption that if something bad happens, it’s bad. Speed in response is the most perishable commodity you have…We looked at social media as the public telling us enough information to suggest this was worse than we thought and to make decisions to spend [taxpayer] money to get moving without waiting for formal request, without waiting for assessments, without waiting to know how bad because we needed to change that outcome.”

“Fugate also emphasized that using social media as an information source isn’t a precise science and the response isn’t going to be precise either. “Disasters are like horseshoes, hand grenades and thermal nuclear devices, you just need to be close— preferably more than less.”