Artificial Intelligence (AI) can be used to automatically predict the credibility of tweets generated during disasters. AI can also be used to automatically rank the credibility of tweets posted during major events. Aditi Gupta et al. applied these same information forensics techniques to automatically identify fake images posted on Twitter during Hurricane Sandy. Using a decision tree classifier, the authors were able to predict which images were fake with an accuracy of 97%. Their analysis also revealed retweets accounted for 86% of all tweets linking to fake images. In addition, their results showed that 90% of these retweets were posted by just 30 Twitter users.
The authors collected the URLs of fake images shared during the hurricane by drawing on the UK Guardian’s list and other sources. They compared these links with 622,860 tweets that contained links and the words “Sandy” & “hurricane” posted between October 20th and November 1st, 2012. Just over 10,300 of these tweets and retweets contained links to URLs of fake images while close to 5,800 tweets and retweets pointed to real images. Of the ~10,300 tweets linking to fake images, 84% (or 9,000) of these were retweets. Interestingly, these retweets spike about 12 hours after the original tweets are posted. This spike is driven by just 30 Twitter users. Furthermore, the vast majority of retweets weren’t made by Twitter followers but rather by those following certain hashtags.
Gupta et al. also studied the profiles of users who tweeted or retweeted fake images (User Features) and also the content of their tweets (Tweet Features) to determine whether these features (listed below) might be predictive of whether a tweet posts to a fake image. Their decision tree classifier achieved an accuracy of over 90%, which is remarkable. But the authors note that this high accuracy score is due to “the similar nature of many tweets since since a lot of tweets are retweets of other tweets in our dataset.” In any event, their analysis also reveals that Tweet-based Features (such as length of tweet, number of uppercase letters, etc.), were far more accurate in predicting whether or not a tweeted image was fake than User-based Features (such as number of friends, followers, etc.). One feature that was overlooked, however, is gender.
In conclusion, “content and property analysis of tweets can help us in identifying real image URLs being shared on Twitter with a high accuracy.” These results reinforce the proof that machine computing and automated techniques can be used for information forensics as applied to images shared on social media. In terms of future work, the authors Aditi Gupta, Hemank Lamba, Ponnurangam Kumaraguru and Anupam Joshi plan to “conduct a larger study with more events for identification of fake images and news propagation.” They also hope to expand their study to include the detection of “rumors and other malicious content spread during real world events apart from images.” Lastly, they “would like to develop a browser plug-in that can detect fake images being shared on Twitter in real-time.” There full paper is available here.
Needless to say, all of this is music to my ears. Such a plugin could be added to our Artificial Intelligence for Disaster Response (AIDR) platform, not to mention our Verily platform, which seeks to crowdsource the verification of social media reports (including images and videos) during disasters. What I also really value about the authors’ approach is how pragmatic they are with their findings. That is, by noting their interest in developing a browser plugin, they are applying their data science expertise for social good. As per my previous blog post, this focus on social impact is particularly rare. So we need more data scientists like Aditi Gupta et al. This is why I was already in touch with Aditi last year given her research on automatically ranking the credibility of tweets. I’ve just reached out to her again to explore ways to collaborate with her and her team.