Category Archives: Social Computing

Why USAID’s Crisis Map of Syria is so Unique

While static, this crisis map includes a truly unique detail. Click on the map below to see a larger version as this may help you spot what is so striking.

For a hint, click this link. Still stumped? Look at the sources listed in the Key.

 

Rapidly Verifying the Credibility of Information Sources on Twitter

One of the advantages of working at QCRI is that I’m regularly exposed to peer-reviewed papers presented at top computing conferences. This is how I came across an initiative called “Seriously Rapid Source Review” or SRSR. As many iRevolution readers know, I’m very interested in information forensics as applied to crisis situations. So SRSR certainly caught my attention.

The team behind SRSR took a human centered design approach in order to integrate journalistic practices within the platform. There are four features worth noting in this respect. The first feature to note in the figure below is the automated filter function, which allows one to view tweets generated by “Ordinary People,” “Journalists/Bloggers,” “Organizations,” “Eyewitnesses” and “Uncategorized.” The second feature, Location, “shows a set of pie charts indica-ting the top three locations where the user’s Twitter contacts are located. This cue provides more location information and indicates whether the source has a ‘tie’ or other personal interest in the location of the event, an aspect of sourcing exposed through our preliminary interviews and suggested by related work.”


The third feature worth noting is the “Eyewitness” icon. The SRSR team developed the first ever automatic classifier to identify eyewitness reports shared on Twitter. My team and I at QCRI are developing a second one that focuses specifically on automatically classifying eyewitness reports during sudden-onset natural disasters. The fourth feature is “Entities,” which displays the top five entities that the user has mentioned in their tweet history. These include references to organizations, people and places, which can reveal important patterns about the twitter user in question.

Journalists participating in this applied research found the “Location” feature particularly important when assess the credibility of users on Twitter. They noted that “sources that had friends in the location of the event were more believable, indicating that showing friends’ locations can be an indicator of credibility.” One journalist shared the following: “I think if it’s someone without any friends in the region that they’re tweeting about then that’s not nearly as authoritative, whereas if I find somebody who has 50% of friends are in [the disaster area], I would immediately look at that.”

In addition, the automatic identification of “eyewitnesses” was deemed essential by journalists who participated in the SRSR study. This should not be surprising since “news organizations often use eyewitnesses to add credibility to reports by virtue of the correspondent’s on-site proximity to the event.” Indeed, “Witness-ing and reporting on what the journalist had witnessed have long been seen as quintessential acts of journalism.” To this end, “social media provides a platform where once passive witnesses can become active and share their eyewitness testimony with the world, including with journalists who may choose to amplify their report.”

In sum, SRSR could be used to accelerate the verification of social media con-tent, i.e., go beyond source verification alone. For more on SRSR, please see this computing paper (PDF), which was authored by Nicholas Diakopoulos, Munmun De Choudhury and Mor Naaman.

The Most Impressive Live Global Twitter Map, Ever?

My colleague Kalev Leetaru has just launched The Global Twitter Heartbeat Project in partnership with the Cyber Infrastructure and Geospatial Information Laboratory (CIGI) and GNIP. He shared more information on this impressive initiative with the CrisisMappers Network this morning.

According to Kalev, the project “uses an SGI super-computer to visualize the Twitter Decahose live, applying fulltext geocoding to bring the number of geo-located tweets from 1% to 25% (using a full disambiguating geocoder that uses all of the user’s available information in the Twitter stream, not just looking for mentions of major cities), tone-coding each tweet using a twitter-customized dictionary of 30,000 terms, and applying a brand-new four-stage heatmap engine (this is where the supercomputer comes in) that makes a map of the number of tweets from or about each location on earth, a second map of the average tone of all tweets for each location, a third analysis of spatial proximity (how close tweets are in an area), and a fourth map as needed for the percent of all of those tweets about a particular topic, which are then all brought together into a single heatmap that takes all of these factors into account, rather than a sequence of multiple maps.”

Kalev added that, “For the purposes of this demonstration we are processing English only, but are seeing a nearly identical spatial profile to geotagged all-languages tweets (though this will affect the tonal results).” The Twitterbeat team is running a live demo showing both a US and world map updated in realtime at Supercomputing on a PufferSphere and every few seconds on the SGI website here.”


So why did Kalev share all this with the CrisisMappers Network? Because he and his team created a rather unique crisis map composed of all tweets about Hurricane Sandy, see the YouTube video above. “[Y]ou  can see how the whole country lights up and how tweets don’t just move linearly up the coast as the storm progresses, capturing the advance impact of such a large storm and its peripheral effects across the country.” The team also did a “similar visualization of the recent US Presidential election showing the chaotic nature of political communication in the Twittersphere.”


To learn more about the project, I recommend watching Kalev’s 2-minute introductory video above.

What Percentage of Tweets Generated During a Crisis Are Relevant for Humanitarian Response?

More than half-a-million tweets were generated during the first three days of Hurricane Sandy and well over 400,000 pictures were shared via Instagram. Last year, over one million tweets were generated every five minutes on the day that Japan was struck by a devastating earthquake and tsunami. Humanitarian organi-zations are ill-equipped to manage this volume and velocity of information. In fact, the lack of analysis of this “Big Data” has spawned all kinds of suppositions about the perceived value—or lack thereof—that social media holds for emer-gency response operations. So just what percentage of tweets are relevant for humanitarian response?

One of the very few rigorous and data-driven studies that addresses this question is Dr. Sarah Vieweg‘s 2012 doctoral dissertation on “Situational Awareness in Mass Emergency: Behavioral and Linguistic Analysis of Disaster Tweets.” After manually analyzing four distinct disaster datasets, Vieweg finds that only 8% to 20% of tweets generated during a crisis provide situational awareness. This implies that the vast majority of tweets generated during a crisis have zero added value vis-à-vis humanitarian response. So critics have good reason to be skeptical about the value of social media for disaster response.

At the same time, however, even if we take Vieweg’s lower bound estimate, 8%, this means that over 40,000 tweets generated during the first 72 hours of Hurricane Sandy may very well have provided increased situational awareness. In the case of Japan, more than 100,000 tweets generated every 5 minutes may have provided additional situational awareness. This volume of relevant infor-mation is much higher and more real-time than the information available to humanitarian responders via traditional channels.

Furthermore, preliminary research by QCRI’s Crisis Computing Team show that 55.8% of 206,764 tweets generated during a major disaster last year were “Informative,” versus 22% that were “Personal” in nature. In addition, 19% of all tweets represented “Eye-Witness” accounts, 17.4% related to information about “Casualty/Damage,” 37.3% related to “Caution/Advice,” while 16.6% related to “Donations/Other Offers.” Incidentally, the tweets were automatically classified using algorithms developed by QCRI. The accuracy rate of these ranged from 75%-81% for the “Informative Classifier,” for example. A hybrid platform could then push those tweets that are inaccurately classified to a micro-tasking platform for manual classification, if need be.

This research at QCRI constitutes the first phase of our work to develop a Twitter Dashboard for the Humanitarian Cluster System, which you can read more about in this blog post. We are in the process of analyzing several other twitter datasets in order to refine our automatic classifiers. I’ll be sure to share our preliminary observations and final analysis via this blog.

The Limits of Crowdsourcing Crisis Information and The Promise of Advanced Computing


First, I want to express my sincere gratitude to the dozen or so iRevolution readers who recently contacted me. I have indeed not been blogging for the past few weeks but this does not mean I have decided to stop blogging altogether. I’ve simply been ridiculously busy (and still am!). But I truly, truly appreciate the kind encouragement to continue blogging, so thanks again to all of you who wrote in.

Now, despite the (catchy?) title of this blog post, I am not bashing crowd-sourcing or worshipping on the alter of technology. My purpose here is simply to suggest that the crowdsourcing of crisis information is an approach that does not scale very well. I have lost count of the number of humanitarian organizations who said they simply didn’t have hundreds of volunteers available to manually monitor social media and create a live crisis map. Hence my interest in advanced computing solutions.

The past few months at the Qatar Computing Research Institute (QCRI) have made it clear to me that developing and applying advanced computing solutions to address major humanitarian challenges is anything but trivial. I have learned heaps about social computing, machine learning and big data analytics. So I am now more aware of the hurdles but am even more excited than before about the promise that advanced computing holds for the development of next-generation humanitarian technology.

The way forward combines both crowdsourcing and advanced computing. The next generation of humanitarian technologies will take a hybrid approach—at times prioritizing “smart crowdsourcing” and at other times leading with automated algorithms. I shall explain what I mean by smart crowdsourcing in a future post. In the meantime, the video above from my recent talk at TEDxSendai expands on the themes I have just described.

MAQSA: Social Analytics of User Responses to News

Designed by QCRI in partnership with MIT and Al-Jazeera, MAQSA provides an interactive topic-centric dashboard that summarizes news articles and user responses (comments, tweets, etc.) to these news items. The platform thus helps editors and publishers in newsrooms like Al-Jazeera’s better “understand user engagement and audience sentiment evolution on various topics of interest.” In addition, MAQSA “helps news consumers explore public reaction on articles relevant to a topic and refine their exploration via related entities, topics, articles and tweets.” The pilot platform currently uses Al-Jazeera data such as Op-Eds from Al-Jazeera English.

Given a topic such as “The Arab Spring,” or “Oil Spill”, the platform combines time, geography and topic to “generate a detailed activity dashboard around relevant articles. The dashboard contains an annotated comment timeline and a social graph of comments. It utilizes commenters’ locations to build maps of comment sentiment and topics by region of the world. Finally, to facilitate exploration, MAQSA provides listings of related entities, articles, and tweets. It algorithmically processes large collections of articles and tweets, and enables the dynamic specification of topics and dates for exploration.”

While others have tried to develop similar dashboards in the past, these have “not taken a topic-centric approach to viewing a collection of news articles with a focus on their user comments in the way we propose.” The team at QCRI has since added a number of exciting new features for Al-Jazeera to try out as widgets on their site. I’ll be sure to blog about these and other updates when they are officially launched. Note that other media companies (e.g., UK Guardian) will also be able to use this platform and widgets once they become public.

As always with such new initiatives, my very first thought and question is: how might we apply them in a humanitarian context? For example, perhaps MAQSA could be repurposed to do social analytics of responses from local stakeholders with respect to humanitarian news articles produced by IRIN, an award-winning humanitarian news and analysis service covering the parts of the world often under-reported, misunderstood or ignored. Perhaps an SMS component could also be added to a MAQSA-IRIN platform to facilitate this. Or perhaps there’s an application for the work that Internews carries out with local journalists and consumers of information around the world. What do you think?

Six Degrees of Separation: Implications for Verifying Social Media

The Economist recently published this insightful article entitled” Six Degrees of Mobilisation: To what extent can social networking make it easier to find people and solve real-world problems?” The notion, six degrees of separation, comes from Stanley Milgram’s experiment in the 1960s which found that there were, on average, six degrees of separation between any two people in the US. Last year, Facebook found that users on the social network were separated by an average of 4.7 hops. The Economist thus asks the following, fascinating question:

“Can this be used to solve real-world problems, by taking advantage of the talents and connections of one’s friends, and their friends? That is the aim of a new field known as social mobilisation, which treats the population as a distributed knowledge resource which can be tapped using modern technology.”

The article refers to DARPA’s Red Balloon Challenge, which I already blogged about here: “Time-Critical Crowdsourcing for Social Mobilization and Crowd-Solving.”  The Economist also references DARPA’s TagChallenge. In both cases, the winning teams leveraged social media using crowdsourcing and clever incentive mechanisms. Can this approach also be used to verify social media content during a crisis?

This new study on disasters suggests that the “degrees of separation” between any two organizations in the field is 5. So if the location of red balloons and individuals can be crowdsourced surprisingly quickly, then can the evidence necessary to verify social media content during a disaster be collected as rapidly and reliably? If we are only separated by four-to-six degrees, then this would imply that it only takes that many hops to find someone connected to me (albeit indirectly) who could potentially confirm or disprove the authenticity of a particularly piece of information. This approach was used very successfully in Kyrgyzstan a couple years ago. Can we develop a platform to facilitate this process? And if so, what design features (e.g., gamification) are necessary to mobilize participants and make this tool a success?

Accelerating the Verification of Social Media Content

Journalists have already been developing a multitude of tactics to verify user-generated content shared on social media. As noted here, the BBC has a dedicated User-Generated Content (UGC) Hub that is tasked with verifying social media information. The UK Guardian, Al-Jazeera, CNN and others are also developing competency in what I refer to as “information forensics”. It turns out there are many tactics that can be used to try and verify social media content. Indeed, applying most of these existing tactics can be highly time consuming.

So building a decision-tree that combines these tactics is the way to go. But doing digital detective work online is still a time-intensive effort. Numerous pieces of digital evidence need to be collected in order to triangulate and ascertain the veracity of just one given report. We therefore need tools that can accelerate the processing of a verification decision-tree. To be sure, information is the most perishable commodity in a crisis—for both journalists and humanitarian pro-fessionals. This means that after a certain period of time, it no longer matters whether a report has been verified or not because the news cycle or crisis has unfolded further since.

This is why I’m a fan of tools like Rapportive. The point is to have the decision-tree not only serve as an instruction-set on what types of evidence to collect but to actually have a platform that collects that information. There are two general strategies that could be employed to accelerate and scale the verification process. One is to split the tasks listed in the decision-tree into individual micro-tasks that can be distributed and independently completed using crowdsourcing. A second strategy is to develop automated ways to collect the evidence.

Of course, both strategies could also be combined. Indeed, some tasks are far better suited for automation while others can only be carried about by humans. In sum, the idea here is to save journalists and humanitarians time by considerably reducing the time it takes to verify user-generated content posted on social media. I am also particularly interested in gamification approaches to solve major challenges, like the Protein Fold It game. So if you know of any projects seeking to solve the verification challenge described above in novel ways, I’d be very grateful for your input in the comments section below. Thank you!

Could Twitris+ Be Used for Disaster Response?

I recently had the pleasure of speaking with Hermant Purohit and colleagues who have been working on an interesting semantic social web application called Twitris+. A project of the the Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis), Twitris+ uses “real-time monitoring and multi-faceted analysis of social signals to provide insights and a framework for situational awareness, in-depth event analysis and coordination, emergency response aid, reputation management etc.”

Twitris+ packs together quite an array of social computing features, integrating spatio-temporal-thematic dimensions, people-content network analysis and sentiment-emotion subjectivity analysis. The tool also aggregates a range of social data and web resources such as twitter, online news, Wikipedia pages, other multimedia content, etc., in addition to SMS data, for which the team was recently granted a patent.

Unlike many other social media platforms I’ve reviewed over recent months, Twitris+ geo-tags content at the tweet-level rather than at the bio level. That is, many platforms simply geo-code tweets based on where a person says s/he is as per their Twitter bio. Accurately and comprehensively geo-referencing social media content is of course no trivial matter. Since many tweets do not include geographic information, colleagues at GeoIQ are seeking to infer geographic information after analyzing a given stream of tweets, for example.

I look forward to continuing my conversations with Hermant and team. Indeed, I am particularly interested to see which emergency management organizations begin to pilot the platform to enhance their situational awareness during a crisis. Their feedback will be invaluable to Twitris+ and to many of us in the humani-tarian technology space.

How People in Emergencies Use Communication to Survive

“Still Left in the Dark? How People in Emergencies Use Communication to Survive — And How Humanitarian Agencies Can Help” is an excellent report pub-lished by the BBC World Service Trust earlier this year. It is a follow up to the BBC’s 2008 study “Left in the Dark: The Unmet Need for Information in Humanitarian Emergencies.” Both reports are absolute must-reads. I highlight the most important points from the 2012 publication below.

Are Humanitarians Being Left in the Dark?

The disruptive impact of new information and communication technologies (ICTs) is hardly a surprise. Back in 2007, researchers studying the use of social media during “forest fires in California concluded that ‘these emergent uses of social media are pre-cursors of broader future changes to the institutional and organizational arrangements of disaster response.'” While the main danger in 2008 was that disaster-affected communities would continue to be left in the dark since humanitarian organizations were not prioritizing information delivery, in 2012, “it may now be the humanitarian agencies themselves […] who risk being left in the dark.” Why? “Growing access to new technologies make it more likely that those affected by disaster will be better placed to access information and communicate their own needs.” Question is: “are humanitarian agencies prepared to respond to, help and engage with those who are communicating with them and who demand better information?” Indeed, “one of the consequences of greater access to, and the spread of, communications technology is that communities now expect—and demand—interaction.”

Monitoring Rumors While Focusing on Interaction and Listening

The BBC Report invites humanitarian organizations to focus on meaningful interaction with disaster-affected communities, rather than simply on message delivery. “Where agencies do address the question of communication with affected communities, this still tends to be seen as a question of relaying infor-mation (often described as ‘messaging’) to an unspecified ‘audience’ through a channel selected as appropriate (usually local radio). It is to be delivered when the agency thinks that it has something to say, rather than in response to demand. In an environment in which […] interaction is increasingly expected, this approach is becoming more and more out of touch with community needs. It also represents a fundamental misunderstanding of the nature and potential of many technological tools particularly Twitter, which work on a real time many-to-many information model rather than a simple broadcast.”

Two-way communication with disaster-affected communities requires two-way listening. Without listening, there can be no meaningful communication. “Listening benefits agencies, as well as those with whom they communicate. Any agency that does not monitor local media—including social media—for misinformation or rumors about their work or about important issues, such as cholera awareness risks, could be caught out by the speed at which information can move.” This is an incredibly important point. Alas, humanitarian organ-izations have not caught up with recent advances in social computing and big data analytics. This is one of the main reasons I joined the Qatar Computing Research Institute (QCRI); i.e., to spearhead the development of next-generation humani-tarian technology solutions.

Combining SMS with Geofencing for Emergency Alerts

Meanwhile, in Haiti, “phone company Digicel responded to the 2010 cholera outbreak by developing methods that would send an SMS to anyone who travelled through an identified cholera hotspot, alerting them to the dangers and advising on basic precautions.” The later is an excellent example of geofencing in action. That said, “while responders tend to see communication as a process either of delivering information (‘messaging’) or extracting it, disaster survivors seem to see the ability to communicate and the process of communication itself as every bit as important as the information delivered.”

Communication & Community-Based Disaster Response Efforts

As the BBC Report notes, “there is also growing evidence that communities in emergencies are adept at leveraging communications technology to organize their own responses.” This is indeed true as these recent examples demonstrate:

“Communications technology is empowering first responders in new and extremely potent ways that are, at present, little understood by international humanitarians. While aid agencies hesitate, local communities are using commu-nications technology to reshape the way they prepare for and respond to emergencies.” There is a definite payoff to those agencies that employ an “integrated approach to communicating and engaging with disaster affected communities […]” since they are “viewed more positively by beneficiaries than those that [do] not.” Indeed, “when disaster survivors are able to communicate with aid agencies their perceptions become more positive.”

Using New Technologies to Manage Local Feedback Mechanisms

So why don’t more agencies follow suite? Many are concerned that establishing feedback systems will prove impossible to manage let alone sustain. They fear that “they would not be able to answer questions asked, that they [would] not have the skills or capacity to manage the anticipated volume of inputs and that they [would be] unequipped to deal with people who would (it is assumed) be both angry and critical.”

I wonder whether these aid agencies realize that many private sector companies have feedback systems that engage millions of customers everyday; that these companies are using social media and big data analytics to make this happen. Some are even crowdsourcing their customer service support. It is high time that the humanitarian community realize that the challenges they face aren’t that unique and that solutions have already been developed in other sectors.

There are only a handful of examples of positive deviance vis-a-vis the setting up of feedback systems in the humanitarian space. Oxfam found that simply com-bining the “automatic management of SMS systems” with “just one dedicated local staff member […] was enough to cope with demand.” When the Danish Refugee Council set up their own SMS complaints mechanism, they too expected be overwhelmed with criticisms. “To their surprise, more than half of the SMS’s they received via their feedback system […] have been positive, with people thanking the agency for their assistance […].” This appears to be a pattern since “many other agencies reported receiving fewer ‘difficult’ questions than anticipated.”

Naturally, “a systematic and resourced approach for feedback” is needed either way. Interestingly, “many aid agencies are in fact now running de facto feedback and information line systems without realizing it. […] most staff who work directly with disaster survivors will be asked for contact details by those they interact with, and will give their own personal mobile numbers.” These ad hoc “systems” are hardly efficient, well-resourced or systematic, however.

User-Generated Content, Representativeness and Ecosystems

Obviously, user-generated content shared via social media may not be represen-tative. “But, as costs fall and coverage increases, all the signs are that usage will increase rapidly in rural areas and among poorer people. […] As one Somali NGO staff member commented […], ‘they may not have had lunch — but they’ll have a mobile phone.'” Moreover, there is growing evidence that individuals turn to social media platforms for the first time as a result of crisis. “In Thailand, for example, the use of social media increased 20% when the 2010 floods began–with fairly equal increases found in metropolitan Bangkok and in rural provinces.”

While the vast majority of Haitians in Port-au-Prince are not on Twitter, “the city’s journalists overwhelmingly are and and see it as an essential source of news and updates.” Since most Haitians listen to radio, “they are, in fact, the indirect beneficiaries of Twitter information systems.” Another interesting fact: “In Kenya, 27% of radio listeners tune in via their mobile phones.” This highlights the importance of an ecosystem approach when communicating with disaster-affected communities. On a related note, recent statistics reveal that individuals in developing countries spend about 17.5% of their income on ICTs compared to just 1.5% in developing countries.