Tag Archives: Twitter

Using Twitter to Map Blackouts During Hurricane Sandy

I recently caught up with Gilal Lotan during a hackathon in New York and was reminded of his good work during Sandy, the largest Atlantic hurricane on record. Amongst other analytics, Gilal created a dynamic map of tweets referring to power outages. “This begins on the evening October 28th as people mostly joke about the prospect of potentially losing power. As the storm evolves, the tone turns much more serious. The darker a region on the map, the more aggregate Tweets about power loss that were seen for that region.” The animated map is captured in the video below.

Hashtags played a key role in the reporting. The #NJpower hashtag, for example, was used to ‘help  keep track of the power situation throughout the state (1). As depicted in the tweet below, “users and news outlets used this hashtag to inform residents where power outages were reported and gave areas updates as to when they could expect their power to come back” (1). 

NJpower tweet

As Gilal notes, “The potential for mapping out this kind of information in realtime is huge. Think of generating these types of maps for different scenarios– power loss, flooding, strong winds, trees falling.” Indeed, colleagues at FEMA and ESRI had asked us to automatically extract references to gas leaks on Twitter in the immediate aftermath of the Category 5 Tornado in Oklahoma. One could also use a platform like GeoFeedia, which maps multiple types of social media reports based on keywords (i.e., not machine learning). But the vast majority of Twitter users do not geo-tag their tweets. In fact, only 2.7% of tweets are geotagged, according to this study. This explains why enlightened policies are also important for humanitarian technologies to work—like asking the public to temporally geo-tag their social media updates when these are relevant to disaster response.

While basing these observations on people’s Tweets might not always bring back valid results (someone may jokingly tweet about losing power),” Gilal argues that “the aggregate, especially when compared to the norm, can be a pretty powerful signal.” The key word here is norm. If an established baseline of geo-tagged tweets for the northeast were available, one would have a base-map of “normal” geo-referenced twitter activity. This would enable us to understand deviations from the norm. Such a base-map would thus place new tweets in temporal and geo-spatial context.

In sum, creating live maps of geo-tagged tweets is only a first step. Base-maps should be rapidly developed and overlaid with other datasets such as population and income distribution. Of course, these datasets are not always available acessing historical Twitter data can also be a challenge. The latter explains why Big Data Philanthropy for Disaster Response is so key.

bio

Automatically Identifying Fake Images Shared on Twitter During Disasters

Artificial Intelligence (AI) can be used to automatically predict the credibility of tweets generated during disasters. AI can also be used to automatically rank the credibility of tweets posted during major events. Aditi Gupta et al. applied these same information forensics techniques to automatically identify fake images posted on Twitter during Hurricane Sandy. Using a decision tree classifier, the authors were able to predict which images were fake with an accuracy of 97%. Their analysis also revealed retweets accounted for 86% of all tweets linking to fake images. In addition, their results showed that 90% of these retweets were posted by just 30 Twitter users.

Fake Images

The authors collected the URLs of fake images shared during the hurricane by drawing on the UK Guardian’s list and other sources. They compared these links with 622,860 tweets that contained links and the words “Sandy” & “hurricane” posted between October 20th and November 1st, 2012. Just over 10,300 of these tweets and retweets contained links to URLs of fake images while close to 5,800 tweets and retweets pointed to real images. Of the ~10,300 tweets linking to fake images, 84% (or 9,000) of these were retweets. Interestingly, these retweets spike about 12 hours after the original tweets are posted. This spike is driven by just 30 Twitter users. Furthermore, the vast majority of retweets weren’t made by Twitter followers but rather by those following certain hashtags. 

Gupta et al. also studied the profiles of users who tweeted or retweeted fake images  (User Features) and also the content of their tweets (Tweet Features) to determine whether these features (listed below) might be predictive of whether a tweet posts to a fake image. Their decision tree classifier achieved an accuracy of over 90%, which is remarkable. But the authors note that this high accuracy score is due to “the similar nature of many tweets since since a lot of tweets are retweets of other tweets in our dataset.” In any event, their analysis also reveals that Tweet-based Features (such as length of tweet, number of uppercase letters, etc.), were far more accurate in predicting whether or not a tweeted image was fake than User-based Features (such as number of friends, followers, etc.). One feature that was overlooked, however, is gender.

Information Forensics

In conclusion, “content and property analysis of tweets can help us in identifying real image URLs being shared on Twitter with a high accuracy.” These results reinforce the proof that machine computing and automated techniques can be used for information forensics as applied to images shared on social media. In terms of future work, the authors Aditi Gupta, Hemank Lamba, Ponnurangam Kumaraguru and Anupam Joshi plan to “conduct a larger study with more events for identification of fake images and news propagation.” They also hope to expand their study to include the detection of “rumors and other malicious content spread during real world events apart from images.” Lastly, they “would like to develop a browser plug-in that can detect fake images being shared on Twitter in real-time.” There full paper is available here.

Needless to say, all of this is music to my ears. Such a plugin could be added to our Artificial Intelligence for Disaster Response (AIDR) platform, not to mention our Verily platform, which seeks to crowdsource the verification of social media reports (including images and videos) during disasters. What I also really value about the authors’ approach is how pragmatic they are with their findings. That is, by noting their interest in developing a browser plugin, they are applying their data science expertise for social good. As per my previous blog post, this focus on social impact is particularly rare. So we need more data scientists like Aditi Gupta et al. This is why I was already in touch with Aditi last year given her research on automatically ranking the credibility of tweets. I’ve just reached out to her again to explore ways to collaborate with her and her team.

bio

Using Twitter to Detect Micro-Crises in Real-Time

Social media is increasingly used to communicate during major crises. But what about small-scale incidents such as a car crash or fire? These “micro-crises” typically generate a far smaller volume of social media activity during a much shorter period and more bounded geographical area. Detecting these small-scale events thus poses an important challenge for the field of Crisis Computing.

Axel Shulz et al

Axel Schulz just published co-authored a paper on this exact challenge. In this study, he and co-authors Petar Ristoski & Heiko Paulheim “present a solution for a real-time identifi cation of small scale incidents using microblogs,” which uses machine learning—combining text classi cation and semantic enrichment of microblogs—to increase situational awareness. The study draws on 7.5 million tweets posted in the city centers of Seattle and Memphis during November & December 2012 and February 2013. The authors used the “Seattle Real Time Fire 911 Calls” dataset to identify relevant keywords in the collected tweets. They also used WordNet to “extend this set by adding the direct hyponyms. For instance, the keyword “accident” was extended with ‘collision’, ‘crash’, ‘wreck’, ‘injury’, ‘fatal accident’, and ‘casualty’.”

An evaluation of this combined “text classi cation” and “semantic enrichment” approach shows that small scale incidents can be identified with an accuracy 89%. A copy of Axel et al.‘s paper is available here (PDF). This is a remarkable level of accuracy given the rare and micro-level nature of the incidents studied.

bio

Using Big Data to Inform Poverty Reduction Strategies

My colleagues and I at QCRI are spearheading a new experimental Research and Development (R&D) project with the United Nations Development Program (UNDP) team in Cairo, Egypt. Colleagues at Harvard University, MIT and UC Berkeley have also joined the R&D efforts as full-fledged partners. The research question: can an analysis of Twitter traffic in Egypt tell us anything about changes in unemployment and poverty levels? This question was formulated with UNDP’s Cairo-based Team during several conversations I had with them in early 2013.

Egyptian Tweets

As is well known, a major challenge in the development space is the lack of access to timely socio-economic data. So the question here is whether alternative, non-traditional sources of information (such as social media) can provide a timely and “good enough” indication of changing trends. Thanks to our academic partners, we have access to hundreds of millions of Egyptian tweets (both historical and current) along with census and demographic data for ground-truth purposes. If the research yields robust results, then our UNDP colleagues could draw on more real-time data to complement their existing datasets, which may better inform some of their local poverty reduction and development strategies. This more rapid feedback loop could lead to faster economic empowerment for local communities in Egypt. Of course, there are many challenges to working with social data vis-a-vis representation and sample bias. But that is precisely why this kind of experimental research is important—to determine whether any of our results are robust to biases in phone ownership, twitter-use, etc.

bio

The Geography of Twitter: Mapping the Global Heartbeat

My colleague Kalev Leetaru recently co-authored this comprehensive study on the various sources and accuracies of geographic information on Twitter. This is the first detailed study of its kind. The detailed analysis, which runs some 50-pages long, has important implications vis-a-vis the use of social media in emergency management and humanitarian response. Should you not have the time to analyze the comprehensive study, this blog post highlights the most important and relevant findings.

Kalev et al. analyzed 1.5 billion tweets (collected from the Twitter Decahose via GNIP) between October 23 and November 30th, 2012. This came to 14.3 billion words posted by 35% of all active users at the time. Note that 2.9% of the world’s population are active Twitter users and that 87% of all tweets ever posted since the launch of Twitter in 2006 were posted in the past 24 months alone. On average, Kalev and company found that the lowest number of tweets posted per hour is one million; the highest is 2 million. In addition, almost 50% of all tweets are posted by 5% of users. (Click on images to enlarge).

Tweets

In terms of geography, there are two ways to easily capture geographic data from Twitter. The first is from the location information specified by a user when registering for a Twitter account (selected from a drop down menu of place names). The second, which is automatically generated, is from the coordinates of the Twitter user’s location when tweeting, which is typically provided via GPS or cellular triangulation. On a typical day, about 2.7% of Tweets contain GPS or cellular data while 2.02% of users list a place name when registering (1.4% have both). The figure above displays all GPS/cellular coordinates captured from tweets during the 39 days of study. In contrast, the figure below combines all Twitter locations, adding registered place names and GPS/cellular data (both in red), and overlays this with the location of electric lights (blue) based on satellite imagery obtained from NASA.

Tweets / Electricity

White areas depict locations with an equal balance of tweets and electricity. Red areas reveal a higher density of tweets than night lights while blue areas have more night lights than tweets.” Iran and China show substantially fewer tweets than their electricity levels would suggest, reflecting their bans on Twitter, while India shows strong clustering of Twitter usage along the coast and its northern border, even as electricity use is far more balanced throughout the country. Russia shows more electricity usage in its eastern half than Twitter usage, while most countries show far more Twitter usage than electricity would suggest.”

The Pearson correlation between tweets and lights is 0.79, indicating very high similarity. That is, wherever in the world electricity exists, the chances of there also being Twitter users is very high indeed. That is, tweets are evenly distributed geographically according to the availability of electricity. And so, event though “less than three percent of all tweets having geolocation information, this suggests they could be used as a dynamic reference baseline to evaluate the accuracy of other methods of geographic recovery.” Keep in mind that the light bulb was invented 134 years ago in contrast to Twitter’s short 7-year history. And yet, the correlation is already very strong. This is why they call it an information revolution. Still, just 1% of all Twitter users accounted for 66% of all georeferenced tweets during the period of study, which means that relying purely on these tweets may provide a skewed view of the Twitterverse, particularly over short periods of time. But whether this poses a problem ultimately depends on the research question or task at hand.

Twitter table

The linguistic geography of Twitter is critical: “If English is rarely used outside of the United States, or if English tweets have a fundamentally different geographic profile than other languages outside of the United States, this will significantly skew geocoding results.” As the table below reveals, georeferenced tweets with English content constitute 41.57% of all geo-tagged tweets.

Geo Tweets Language

The data from the above table is displayed geographically below for the European region. See the global map here. “In cases where multiple languages are present at the same coordinate, the point is assigned to the most prevalent language at that point and colored accordingly.” Statistical analyses of geo-tagged English tweets compared to all other languages suggests that “English offers a spatial proxy for all languages and that a geocoding algorithm which processes only English will still have strong penetration into areas dominated by other languages (though English tweets may discuss different topics or perspectives).”

Twitter Languages Europe

Another important source of geographic information is a Twitter user’s bio. This public location information was available for 71% of all tweets studied by Kalev and company. Interestingly, “Approximately 78.4 percent of tweets include the user’s time zone in textual format, which offers an approximation of longitude […].” As Kalev et al. note, “Nearly one third of all locations on earth share their name with another location somewhere else on the planet, meaning that a reference to ‘Urbana’ must be disambiguated by a geocoding system to determine which of the 12 cities in the world it might refer to, including 11 cities in the United States with that name.”

There are several ways to get around this challenging, ranging from developing a Full Text Geocoder to using gazetteers such a Wikipedia Gazetteer and MaxFind which machine translation. Applying the latter has revealed that the “textual geographic density of Twitter changes by more than 53 percent over the course of each day. This has enormous ramifications for the use of Twitter as a global monitoring system, as it suggests that the representativeness of geographic tweets changes considerably depending on time of day.” That said, the success of a monitoring system is solely dependent on spatial data. Temporal factors and deviations from a baseline also enable early detection.  In any event, “The small volume of georeferenced tweets can be dramatically enhanced by applying geocoding algorithms to the textual content and metadata of each tweet.”

Kalet et al. also carried out a comprehensive analysis of geo-tagged retweets. They find that “geography plays little role in the location of influential users, with the volume of retweets instead simply being a factor of the total population of tweets originating from that city.” They also calculated that the average geographical distance between two Twitter users “connected” by retweets (RTs) and who geotag their tweets is about 750 miles or 1,200 kilometers. When a Twitter user references another (@), the average geographical distance between the two is 744 miles. This means that RTs and @’s cannot be used for geo-referencing Twitter data, even when coupling this information with time zone data. The figure below depicts the location of users retweeting other users. The geodata for this comes from the geotagged tweets (rather than account information or profile data).

Map of Retweets

On average, about 15.85% of geo-tagged tweets contain links. The most popular links for these include Foursquare, Instagram, Twitter and Facebook. See my previous blog post on the analysis & value of such content for disaster response. In terms of Twitter geography versus that of mainstream news, Kalev et al. analyzed all news items available via Google News during the same period as the tweets they collected. This came to over 3.3 million articles pointing to just under 165,000 locations. The latter are color-coded red in the data ziv below, while Tweets are blue and white areas denote equal balance of both.

Twitter vs News

“Mainstream media appears to have significantly less coverage of Latin America and vastly better greater of Africa. It also covers China and Iran much more strongly, given their bans on Twitter, as well as having enhanced coverage of India and the Western half of the United States. Overall, mainstream media appears to have more even coverage, with less clustering around major cities.” This suggests “there is a strong difference in the geographic profiles of Twitter and mainstream media and that the intensity of discourse mentioning a country does not necessarily match the intensity of discourse emanating from that country in social media. It also suggests that Twitter is not simply a mirror of mainstream media, but rather has a distinct geographic profile […].”

In terms of future growth, “the Middle East and Eastern Europe account for some of Twitter’s largest new growth areas, while Indonesia, Western Europe, Africa, and Central America have high proportions of the world’s most influential Twitter users.”

Bio

See also:

  • Social Media – Pulse of the Planet? [Link]
  • Big Data for Disaster Response – A list of Wrong Assumptions [Link]
  • A Multi-Indicator Approach for Geolocalization of Tweets [Link]

Analysis of Multimedia Shared in Millions of Tweets After Tornado (Updated)

Humanitarian organizations and emergency management offices are increasingly interested in capturing multimedia content shared on social media during crises. Last year, the UN Office for the Coordination of Humanitarian Affairs (OCHA) activated the Digital Humanitarian Network (DHN) to identify and geotag pictures and videos shared on Twitter that captured the damage caused by Typhoon Pablo, for example. So I’m collaborating with my colleague Hemant Purohit to analyze the multimedia content shared in the millions of tweets posted after the Category 5 Tornado devastated the city of Moore, Oklahoma on May 20th. The results are shared below along with details of a project I am spearheading at QCRI to provide disaster responders with relevant multimedia content in real time during future disasters.

Multimedia_Tornado

For this preliminary multimedia analysis, we focused on the first 48 hours after the Tornado and specifically on the following multimedia sources/types: Twitpic, Instagram, Flickr, JPGs, YouTube and Vimeo. JPGs refers to URLs shared on Twitter that include “.jpg”. Only ~1% of tweets posted during the 2-day period included URLs to multimedia content. We filtered out duplicate URLs to produce the following unique counts depicted above and listed below.

  • Twitpic = 784
  • Instagram = 11,822
  • Flickr = 33
  • JPGs = 347 
  • YouTube = 5,474
  • Vimeo = 88

Clearly, Instagram and Youtube are important sources of multimedia content during disasters. The graphs below (click to enlarge) depict the frequency of individual multimedia types by hour during the first 48 hours after the Tornado. Note that we were only able to collect about 2 million tweets during this period using the Twitter Streaming API but expect that millions more were posted, which is why access to the Twitter Firehose is important and why I’m a strong advocate of Big Data Philanthropy for Humanitarian Response.

Twitpic_Tornado

A comparison of the above Twitpic graph with the Instagram one below suggests very little to no time lag between the two unique streams.

Instagram_Tornado

Clearly Flickr pictures are not widely shared on Twitter during disasters. Only 53 links to Flickr were tweeted compared to 11,822 unique Instagram pictures.

Flickr_Tornado

The sharing of JPG images is more popular than links to Flickr but the total number of uniques still pales in comparison to the number of Instagram pictures.

JPGs_Tornado

The frequency of tweets sharing unique links to Youtube videos does not vary considerably over time.

Youtube_Tornado

In contrast to the large volume of Youtube links shared on twitter, only 88 unique links to Vimeo were shared.

Vimeo_Tornado

Geographic information is of course imperative for disaster response. We collected about 2.7 million tweets during the 10-day period after Tornado and found that 51.23% had geographic data—either the tweet was geo-tagged or the Twitter user’s bio included a location. During the first 48 hours, about 45% of Tweets with links to Twitpic had geographic data; 40% for Flickr and 38% for Instagram . Most digital pictures include embedded geographic information (i.e., the GPS coordinates of the phone or camera, for example). So we’re working on automatically  extracting this information as well.

An important question that arises is which Instagram pictures & Youtube videos actually captured evidence of the damage caused of the Tornado? Of these, which are already geotagged and which could be quickly geotagged manually? The Digital Humanitarian Network was able to answer these questions within 12 hours following the devastating Typhoon that ravaged the Philippines last year (see map below). The reason it took that long is because we spent most of the time customizing the microtasking apps to tag the tweets/links. Moreover, we were looking at every single link shared on twitter, i.e., not just those that linked directly to Instagram, Youtube, etc. We need to do better, and we can.

This is why we’re launching MicroMappers in partnership with the United Nations. MicroMappers are very user-friendly microtasking apps that allows anyone to support humanitarian response efforts with a simple click of the mouse. This means anyone can be a Digital Humanitarian Volunteer. In the case of the Tornado, volunteers could easily have tagged the Instagram pictures posted on Twitter. During Hurricane Sandy, about half-a-million Instagram pictures were shared. This is certainly a large number but other microtasking communities like my friends at Zooniverse tagged millions of pictures in a matter of days. So it is possible.

Incidentally, hundreds of the geo-tagged Instagram pictures posted during the Hurricane captured the same damaged infrastructure across New York, like the same fallen crane, blocked road or a flooded neighborhood. These pictures, taken by multiple eyewitnesses from different angles can easily be “stitched” together to create a 2D or even 3D tableau of the damage. Photosynth (below) already does this stitching automatically for free. Think of Photosynth as Google Street View but using crowdsourced pictures instead. One simply needs to a collection of related pictures, which is what MicroMappers will provide.

Photosynth

Disasters don’t wait. Another major Tornado caused havoc in Oklahoma just yesterday. So we are developing MicroMappers as we speak and plan to test the apps soon. Stay tuned for future blog post updates!

bio

See also: Analyzing 2 Million Disaster Tweets from Oklahoma Tornado [Link]

Crowdsourcing Crisis Information from Syria: Twitter Firehose vs API

Over 400 million tweets are posted every day. But accessing 100% of these tweets (say for disaster response purposes) requires access to Twitter’s “Firehose”. The latter, however, can be prohibitively expensive and also requires serious infrastructure to manage. This explains why many (all?) of us in the Crisis Computing & Humanitarian Technology space use Twitter’s “Streaming API” instead. But how representative are tweets sampled through the API vis-a-vis overall activity on Twitter? This is important question is posed and answered in this new study using Syria as a case study.

Tweets Syria

The analysis focused on “Tweets collected in the region around Syria during the period from December 14, 2011 to January 10, 2012.” The first dataset was collected using Firehose access while the second was sampled from the API. The tag clouds above (click to enlarge) displays the most frequent top terms found in each dataset. The hashtags and geoboxes used for the data collection are listed in the table below.

Syria List

The graph below shows the number of tweets collected between December 14th, 2011 and January 10th, 2012. This amounted 528,592 tweets from the API and 1,280,344 tweets from the Firehose. On average, the API captures 43.5% of tweets available on the Firehose. “One of the more interesting results in this dataset is that as the data in the Firehose spikes, the Streaming API coverage is reduced. One possible explanation for this phenomenon could be that due to the Western holidays observed at this time, activity on Twitter may have reduced causing the 1% threshold to go down.”

Syria Graph

The authors, Fred Morstatter, Jürgen Pfeffer, Huan Liu and Kathleen Carley, also carry out hashtag analysis using each dataset. “Here we see mixed results at small values of n [top hashtags], indicating that the Streaming data may not be good for finding the top hashtags. At larger values of n, we see that the Streaming API does a better job of estimating the top hashtags in the Firehose data.” In addition, the analysis reveals that the “Streaming API data does not consistently find the top hashtags, in some cases revealing reverse correlation with the Firehose data […]. This could be indicative of a filtering process in Twitter’s Streaming API which causes a misrepresentation of top hashtags in the data.”

In terms of social network analysis, the the authors were able to show that “50% to 60% of the top 100 key-players [can be identified] when creating the networks based on one day of Streaming API data.” Aggregating more days’ worth of data “can increase the accuracy substantially. For network level measures, first in-depth analysis revealed interesting correlation between network centralization indexes and the proportion of data covered by the Streaming API.”

Finally, study also compares the geolocation of tweets. More specifically, the authors assess how the “geographic distribution of the geolocated tweets is affected by the sampling performed by the Streaming API. The number of geotagged tweets is low, with only 16,739 geotagged tweets in the Streaming data (3.17%) and 18,579 in the Firehose data (1.45%).” Still, the authors find that “despite the difference in tweets collected on the whole we get 90.10% coverage of geotagged tweets.”

In sum, the study finds that “the results of using the Streaming API depend strongly on the coverage and the type of analysis that the researcher wishes to perform. This leads to the next question concerning the estimation of how much data we actually get in a certain time period.” This is critical if researchers want to place their results into context and potentially apply statistical methods to account (and correct) for bias. The authors suggest that in some cases the Streaming API coverage can be estimated. In future research, they hope to “find methods to compensate for the biases in the Streaming API to provide a more accurate picture of Twitter activity to researchers.” In particularly they want to “determine whether the methodology presented here will yield similar results for Twitter data collected from other domains, such as natural, protest & elections.”

The authors will present their paper at this year’s International Conference on Weblogs and Social Media (ICWSM). So I look forward to meeting them there to discuss related research we are carrying out at QCRI.

bio

 See also:

Results: Analyzing 2 Million Disaster Tweets from Oklahoma Tornado

Thanks to the excellent work carried out by my colleagues Hemant Purohit and Professor Amit Sheth, we were able to collect 2.7 million tweets posted in the aftermath of the Category 4 Tornado that devastated Moore, Oklahoma. Hemant, who recently spent half-a-year with us at QCRI, kindly took the lead on carrying out some preliminary analysis of the disaster data. He sampled 2.1 million tweets posted during the first 48 hours for the analysis below.

oklahoma-tornado-20

About 7% of these tweets (~146,000 tweets) were related to donations of resources and services such as money, shelter, food, clothing, medical supplies and volunteer assistance. Many of the donations-related tweets were informative in nature, e.g.: “As President Obama said this morning, if you want to help the people of Moore, visit [link]”. Approximately 1.3% of the tweets (about 30,000 tweets) referred to the provision of financial assistance to the disaster-affected population. Just over 400 unique tweets sought non-monetary donations, such as “please help get the word out, we are accepting kid clothes to send to the lil angels in Oklahoma.Drop off.

Exactly 152 unique tweets related to offers of help were posted within the first 48 hours of the Tornado. The vast majority of these were asking how to get involved in helping others affected by the disaster. For example: “Anyone know how to get involved to help the tornado victims in Oklahoma??#tornado #oklahomacity” and “I want to donate to the Oklahoma cause shoes clothes even food if I can.” These two offers of help are actually automatically “matchable”, making the notion of a “Match.com” for disaster response a distinct possibility. Indeed, Hemant has been working with my team and I at QCRI to develop algorithms (classifiers) that not only identify relevant needs/offers from Twitter automatically but also suggests matches as a result.

Some readers may be suprised to learn that “only” several hundred unique tweets (out of 2+million) were related to needs/offers. The first point to keep in mind is that social media complements rather than replaces traditional information sources. All of us working in this space fully recognize that we are looking for the equivalent of needles in a haystack. But these “needles” may contain real-time, life-saving information. Second, a significant number of disaster tweets are retweets. This is not a negative, Twitter is particularly useful for rapid information dissemination during crises. Third, while there were “only” 152 unique tweets offering help, this still represents over 130 Twitter users who were actively seeking ways to help pro bono within 48 hours of the disaster. Plus, they are automatically identifiable and directly contactable. So these volunteers could also be recruited as digital humanitarian volunteers for MicroMappers, for example. Fourth, the number of Twitter users continues to skyrocket. In 2011, Twitter had 100 million monthly active users. This figure doubled in 2012. Fifth, as I’ve explained here, if disaster responders want to increase the number of relevant disaster tweets, they need to create demand for them. Enlightened leadership and policy is necessary. This brings me to point six: we were “only” able to collect ~2 million tweets but suspect that as many as 10 million were posted during the first 48 hours. So humanitarian organizations along with their partners need access to the Twitter Firehose. Hence my lobbying for Big Data Philanthropy.

Finally, needs/offers are hardly the only type of useful information available on Twitter during crises, which is why we developed several automatic classifiers to extract data on: caution and advice, infrastructure damage, casualties and injuries, missing people and eyewitness accounts. In the near future, when our AIDR platform is ready, colleagues from the American Red Cross, FEMA, UN, etc., will be able create their own classifiers on the fly to automatically collect information that is directly relevant to them and their relief operations. AIDR is spearheaded by QCRI colleague ChaTo and myself.

For now though, we simply emailed relevant geo-tagged and time-stamped data on needs/offers to colleagues at the American Red Cross who had requested this information. We also shared data related to gas leaks with colleagues at FEMA and ESRI, as per their request. The entire process was particularly insightful for Hemant and I, so we plan to follow up with these responders to learn how we can best support them again until AIDR becomes operational. In the meantime, check out the Twitris+ platform developed by Amit, Hemant and team at Kno.e.sis

bio

See also: Analysis of Multimedia Shared on Twitter After Tornado [Link

How Online Gamers Can Support Disaster Response

IRL

FACT: Over half-a-million pictures were shared on Instagram and more than 20 million tweets posted during Hurricane Sandy. The year before, over 100,000 tweets per minute were posted following the Japan Earthquake and Tsunami. Disaster-affected communities are now more likely than ever to be on social media, which dramatically multiplies the amount of user-generated crisis information posted during disasters. Welcome to Big Data—Big Crisis Data.

Humanitarian organizations and emergency management responders are completely unprepared to deal with this volume and velocity of crisis information. Why is this a problem? Because social media can save lives. Recent empirical studies have shown that an important percentage of social media reports include valuable, informative & actionable content for disaster response. Looking for those reports, however, is like searching for needles in a haystack. Finding the most urgent tweets in an information stack of over 20 million tweets (in real time) is indeed a major challenge.

FACT: More than half a billion people worldwide play computer and video games for at least an hour a day. This amounts to over 3.5 billion hours per week. In the US alone, gamers spend over 4 million hours per week online. The average young person will spend 10,000 hours of gaming by the age of 21. These numbers are rising daily. In early 2013, “World of Warcraft” reached 9.6 million subscribers worldwide, a population larger than Sweden. The online game “League of Legends” has over 12 million unique users every day while more than 20 million users log on to Xbox Live every day.

What if these gamers had been invited to search through the information haystack of 20 million tweets posted during Hurricane Sandy? Lets assume gamers were asked to tag which tweets were urgent without ever leaving their games. This simple 20-second task would directly support disaster responders like the American Red Cross. But the Digital Humanitarian Network (DHN) would have taken more than 100 hours or close to 5 days, assuming all their volunteers were working 24/7 with no breaks. In contrast, the 4 million gamers playing WoW (excluding China) would only need  90 seconds to do this. The 12 million gamers on League of Legends would have taken just 30 seconds.

While some of the numbers proposed above may seem unrealistic, there is absolutely no denying that drawing on this vast untapped resource would significantly accelerate the processing of crisis information during major disasters. In other words, gamers worldwide can play a huge role in supporting disaster response operations. And they want to: gamers playing “World of Warcraft” raised close to $2 million in donations to support relief operations following the Japan Earthquake. They also raised another $2.3 million for victims of Superstorm Sandy. Gamers can easily donate their time as well. This is why my colleague Peter Mosur and I are launching the Internet Response League (IRL). Check out our dedicated website to learn more and join the cause.

bio 

 

How Crowdsourced Disaster Response in China Threatens the Government

In 2010, Russian volunteers used social media and a live crisis map to crowdsource their own disaster relief efforts as massive forest fires ravaged the country. These efforts were seen by many as both more effective and visible than the government’s response. In 2011, Egyptian volunteers used social media to crowdsource their own humanitarian convoy to provide relief to Libyans affected by the fighting. In 2012, Iranians used social media to crowdsource and coordinate grassroots disaster relief operations following a series of earthquakes in the north of the country. Just weeks earlier, volunteers in Beijing crowd-sourced a crisis map of the massive flooding in the city. That map was immediately available and far more useful than the government’s crisis map. In early 2013, a magnitude 7  earthquake struck Southwest China, killing close to 200 and injuring more than 13,000. The response, which was also crowdsourced by volunteers using social media and mobile phones, actually posed a threat to the Chinese Government.

chinaquake

“Wang Xiaochang sprang into action minutes after a deadly earthquake jolted this lush region of Sichuan Province […]. Logging on to China’s most popular social media sites, he posted requests for people to join him in aiding the survivors. By that evening, he had fielded 480 calls” (1). While the government had declared the narrow mountain roads to the disaster-affected area blocked to unauthorized rescue vehicles, Wang and hitchhiked his way through with more than a dozen other volunteers. “Their ability to coordinate — and, in some instances, outsmart a government intent on keeping them away — were enhanced by Sina Weibo, the Twitter-like microblog that did not exist in 2008 but now has more than 500 million users” (2). And so, “While the military cleared roads and repaired electrical lines, the volunteers carried food, water and tents to ruined villages and comforted survivors of the temblor […]” (3). Said Wang: “The government is in charge of the big picture stuff, but we’re doing the work they can’t do” (4).

In response to this same earthquake, another volunteer, Li Chengpeng, “turned to his seven million Weibo followers and quickly organized a team of volunteers. They traveled to the disaster zone on motorcycles, by pedicab and on foot so as not to clog roads, soliciting donations via microblog along the way. What he found was a government-directed relief effort sometimes hampered by bureaucracy and geographic isolation. Two days after the quake, Mr. Li’s team delivered 498 tents, 1,250 blankets and 100 tarps — all donated — to Wuxing, where government supplies had yet to arrive. The next day, they hiked to four other villages, handing out water, cooking oil and tents. Although he acknowledges the government’s importance during such disasters, Mr. Li contends that grass-roots activism is just as vital. ‘You can’t ask an NGO to blow up half a mountain to clear roads and you can’t ask an army platoon to ask a middle-aged woman whether she needs sanitary napkins, he wrote in a recent post” (5).

chinaquake2

As I’ve blogged in the past (here and here, for example), using social media to crowdsourced grassroots disaster response efforts serves to create social capital and strengthen collective action. This explains why the Chinese government (and others) faced a “groundswell of social activism” that it feared could “turn into government opposition” following the earthquake (6). So the Communist Party tried to turn the disaster into a “rallying cry for political solidarity. ‘The more difficult the circumstance, the more we should unite under the banner of the party,’ the state-run newspaper People’s Daily declared […], praising the leadership’s response to the earthquake” (7).

This did not quell the rise in online activism, however, which has “forced the government to adapt. Recently, People’s Daily announced that three volunteers had been picked to supervise the Red Cross spending in the earthquake zone and to publish their findings on Weibo. Yet on the ground, the government is hewing to the old playbook. According to local residents, red propaganda banners began appearing on highway overpasses and on town fences even before water and food arrived. ‘Disasters have no heart, but people do,’ some read. Others proclaimed: ‘Learn from the heroes who came here to help the ones struck by disaster’ (8). Meanwhile, the Central Propaganda Department issued a directive to Chinese newspapers and websites “forbidding them to carry negative news, analysis or commentary about the earthquake” (9). Nevertheless, “Analysts say the legions of volunteers and aid workers that descended on Sichuan threatened the government’s carefully constructed narrative about the earthquake. Indeed, some Chinese suspect such fears were at least partly behind official efforts to discourage altruistic citizens from coming to the region” (10).

Aided by social media and mobile phones, grassroots disaster response efforts present a new and more poignant “Dictator’s Dilemma” for repressive regimes. The original Dictator’s Dilemma refers to an authoritarian government’s competing interest in using information communication technology by expanding access to said technology while seeking to control the democratizing influences of this technology. In contrast, the “Dictator’s Disaster Lemma” refers to a repressive regime confronted with effectively networked humanitarian response at the grassroots level, which improves collective action and activism in political contexts as well. But said regime cannot prevent people from helping each other during natural disasters as this could backfire against the regime.

bio

See also:

 •  How Civil Disobedience Improves Crowdsourced Disaster Response [Link]