My colleague Kate Starbird recently shared a very neat study entitled “Learning from the Crowd: Collaborative Filtering Techniques for Identifying On-the-Ground Twitterers during Mass Disruptions” (PDF). As she and her co-authors rightly argue, “most Twitter activity during mass disruption events is generated by the remote crowd.” So can we use advanced computing to rapidly identify Twitter users who are reporting from ground zero? The answer is yes.
An important indicator of whether or not a Twitter user is reporting from the scene of a crisis is the number of times they are retweeted. During the Egyptian revolution in early 2011, “nearly 30% of highly retweeted Twitter users were physically present at those protest events.” Kate et al. drew on this insight to study tweets posted during the Occupy Wall Street (OWS) protests in September 2011. The authors manually analyzed a sample of more than 2,300 Twitter users to determine which were tweeting from the protests. They found that 4.5% of Twitter users in their sample were actually onsite. Using this dataset as training data, Kate et al. were able to develop a classifier that can automatically identify Twitter users reporting from the protests with an accuracy of just shy of 70%. I expect that more training data could very well help increase this accuracy score.
In any event, “the information resulting from this or any filtering technique must be further combined with human judgment to assess its accuracy.” As the authors rightly note, “this ‘limitation’ fits well within an information space that is witnessing the rise of digital volunteer communities who monitor multiple data sources, including social media, looking to identify and amplify new information coming from the ground.” To be sure, “For volunteers like these, the use of techniques that increase the signal to noise ratio in the data has the potential to drastically reduce the amount of work they must do. The model that we have outlined does not result in perfect classification, but it does increase this signal-to-noise ratio substantially—tripling it in fact.”
I really hope that someone will leverage Kate’s important work to develop a standalone platform that automatically generates a list of Twitter users who are reporting from disaster-affected areas. This would be a very worthwhile contribution to the ecosystem of next-generation humanitarian technologies. In the meantime, perhaps QCRI’s Artificial Intelligence for Disaster Response (AIDR) platform will help digital humanitarians automatically identify tweets posted by eyewitnesses. I’m optimistic since we were able to create a machine learning classifier with an accuracy of 80%-90% for eyewitness tweets. More on this in our recent study.
One question that remains is how to automatically identify tweets like the one above? This person is not an eyewitness but was likely on the phone with her family who are closer to the action. How do we develop a classifier to catch these “second-hand” eyewitness reports?
That’s an awesome paper by Kate that I read a while ago, thanks for bringing it up, Patrick 🙂
To take it further and complement on-ground twitterers, it might be valuable to identify ‘whom to engage with in the online social media communities whose voice is well heard in the overall eco-system, with respect to specific types of needs?’, so as to-
a.) sync with them quickly for getting important information, as well as
b.) coordinate with them to diffuse important information from the coordinators’ side.
I have been investigating this issue for sometime and here’s the prototype (in development for adding module of ‘why someone is influential’), and will discuss at ICCM-13 also next month— (Leveraging power of the ‘remote twitterers’)
i.) During Balochistan earthquake while working with UN OCHA: http://twitris.knoesis.org/pakearthquake2013/network/ (Interesting anecdote found were the evolution of the network of influencers and also, the second influencer under ’emergency’ category who was going for the relief on-site, likely candidate of not discoverable by search methods)
ii.) And here’s analysis on the Occupy Wall Street (OWS) movement (http://twitris.knoesis.org/ows/network/)– http://twitris.knoesis.org/ows/insights/ (for asking more questions on coordination behavior.)
I am wondering if we could use this (or similar) approach to help identify affected people who we can trust to appeal for finical help – what I am getting at is how can we use it to add a layer of trust so that affected could appeal for money through a kickstarter-like fund.
See my post: Kickstarting an Emergency: http://blog.veritythink.com/post/65420995209/kickstarting-an-emergency
Very fascinating point, Andrej, thanks. Especially I liked your idea about distributed nature of funding in that.
Meanwhile, I was wondering about also engaging and empowering those ‘who want to help’ — the legitimate potential suppliers of help of resources. We often get to care for the ‘affected community’ as the primer while there can be coordination issue with those in the ‘helper community’ who desparately want to help by several means. Patrick’s group and I were investigating this coordination issue of both ‘asking for help’ vs. ‘supplying for help’ and then perform matching across them, for various types of resources (e.g., money donations, volunteering, clothing, medical supplies/blood, etc.). This could help avoid the ‘second disaster’ in front of the response coordinators, likewise in the aftermath of Hurricane Sandy: http://www.npr.org/2013/01/09/168946170/thanks-but-no-thanks-when-post-disaster-donations-overwhelm
Also, what do you think about automatically identifying potential seekers and suppliers first (pruning big data), and then verifying for trustability using micro-tasking methods, e.g., MicroMappers (adding trust), bridging a connection between automated and human wisdom in reaching to right set of people given that it’s a funding resource?