Automatically Identifying Fake Images Shared on Twitter During Disasters

Artificial Intelligence (AI) can be used to automatically predict the credibility of tweets generated during disasters. AI can also be used to automatically rank the credibility of tweets posted during major events. Aditi Gupta et al. applied these same information forensics techniques to automatically identify fake images posted on Twitter during Hurricane Sandy. Using a decision tree classifier, the authors were able to predict which images were fake with an accuracy of 97%. Their analysis also revealed retweets accounted for 86% of all tweets linking to fake images. In addition, their results showed that 90% of these retweets were posted by just 30 Twitter users.

The authors collected the URLs of fake images shared during the hurricane by drawing on the UK Guardian’s list and other sources. They compared these links with 622,860 tweets that contained links and the words “Sandy” & “hurricane” posted between October 20th and November 1st, 2012. Just over 10,300 of these tweets and retweets contained links to URLs of fake images while close to 5,800 tweets and retweets pointed to real images. Of the ~10,300 tweets linking to fake images, 84% (or 9,000) of these were retweets. Interestingly, these retweets spike about 12 hours after the original tweets are posted. This spike is driven by just 30 Twitter users. Furthermore, the vast majority of retweets weren’t made by Twitter followers but rather by those following certain hashtags.

Gupta et al. also studied the profiles of users who tweeted or retweeted fake images (User Features) and also the content of their tweets (Tweet Features) to determine whether these features (listed below) might be predictive of whether a tweet posts to a fake image. Their decision tree classifier achieved an accuracy of over 90%, which is remarkable. But the authors note that this high accuracy score is due to “the similar nature of many tweets since since a lot of tweets are retweets of other tweets in our dataset.” In any event, their analysis also reveals that Tweet-based Features (such as length of tweet, number of uppercase letters, etc.), were far more accurate in predicting whether or not a tweeted image was fake than User-based Features (such as number of friends, followers, etc.). One feature that was overlooked, however, is gender.

In conclusion, “content and property analysis of tweets can help us in identifying real image URLs being shared on Twitter with a high accuracy.” These results reinforce the proof that machine computing and automated techniques can be used for information forensics as applied to images shared on social media. In terms of future work, the authors Aditi Gupta, Hemank Lamba, Ponnurangam Kumaraguru and Anupam Joshi plan to “conduct a larger study with more events for identification of fake images and news propagation.” They also hope to expand their study to include the detection of “rumors and other malicious content spread during real world events apart from images.” Lastly, they “would like to develop a browser plug-in that can detect fake images being shared on Twitter in real-time.” There full paper is available here.

Needless to say, all of this is music to my ears. Such a plugin could be added to our Artificial Intelligence for Disaster Response (AIDR) platform, not to mention our Verily platform, which seeks to crowdsource the verification of social media reports (including images and videos) during disasters. What I also really value about the authors’ approach is how pragmatic they are with their findings. That is, by noting their interest in developing a browser plugin, they are applying their data science expertise for social good. As per my previous blog post, this focus on social impact is particularly rare. So we need more data scientists like Aditi Gupta et al. This is why I was already in touch with Aditi last year given her research on automatically ranking the credibility of tweets. I’ve just reached out to her again to explore ways to collaborate with her and her team.

15 responses to “Automatically Identifying Fake Images Shared on Twitter During Disasters”

Andrew | July 1, 2013 at 3:16 pm | Reply

This is interesting stuff, and useful. If only we could get to the bigger issue and teach people not to blindly retweet or reblog things! Like the Photoshop of the sharks swimming next to an escalator that gets trotted out whenever a flood happens.

And the Mcdonalds one in lower left above is by a group of Danish artists! I saw the video at the Hirshhorn Museum in DC. Though I’m sure that one has been tweeted as an example of a flood somewhere too.
Andrew | July 1, 2013 at 3:17 pm | Reply

This is interesting stuff, and useful. Now if only we could get people to think twice before retweeting or reblogging fantastic looking images.

And the one at lower left above (the flooded McDonalds) is a video by a group of Danish artists, I saw it at the Hirshhorn Museum in DC! Though I’m sure that too was tweeted as an example of a flood somewhere.
- Patrick Meier | July 1, 2013 at 4:00 pm | Reply
  
  Thanks for reading and for your comment, Andrew. Yes indeed re McDonalds, those pics are a screenshot of a slide I often use in my talks. I leave the McDonalds pic untagged cause I take a vote on how many think the pic is fake–testing the wisdom of the crowds idea 🙂
  
  On getting people to think twice before retweeting–agreed, this is core to our Verily project:
  
  http://iRevolution.net/2013/05/19/time-critical-crowdsourced-verification
  - Andrew | July 3, 2013 at 11:59 am |
    
    Very cool, thanks. (Sorry for the multiple posts, I couldn’t tell if it took the first time.)
paulmwatson | July 2, 2013 at 7:10 am | Reply

This is all posthoc analysis though, right? Once you have millions of Tweets, run it through our algorithm and we’ll tell you what was fake and what wasn’t.

What about during the event, when it really matters, when you have much smaller datasets to start with.
- Patrick Meier | July 2, 2013 at 8:56 am | Reply
  
  Hi Paul, thanks for reading and for you follow up question. Post-hoc analysis is how we develop machine learning classifiers. Without training data, you can’t apply methods from artificial intelligence for predictive tagging. These classifiers can then be applied in real-time, just like we did during the Category 5 Tornado last month in support of the American Red Cross:
  
  http://iRevolution.net/2013/05/29/analyzing-tweets-tornado
  http://iRevolution.net/2013/06/01/multimedia-tornado-analysis
  
  To learn more about the application of AI and machine learning to real time event detection, please see:
  
  http://iRevolution.net/2013/04/01/auto-extracting-disaster-info
  
  Thanks again,
  Patrick
Pingback: Weekly Roundup of Awesome Links: Week of July 1st 2013
Pingback: New research suggests it's possible to automatically identify fake images on ... - Poynter.org - TWITTEROO.NET
Pingback: Met 97% zekerheid echtheid Twitterfoto’s vaststellen | Twittermania
Pingback: Lesedepot: Die Tipps der Woche (weekly) | Das Textdepot
Pingback: Analyzing Fake Content on Twitter During Boston Marathon Bombings | iRevolution
Pingback: The Best of iRevolution in 2013 | iRevolution
Pingback: New Insights on How To Verify Social Media | iRevolution
Pingback: Automatically Ranking the Credibility of Tweets During Major Events | iRevolution
Pingback: Got TweetCred? Use it To Automatically Identify Credible Tweets | iRevolution

15 responses to “Automatically Identifying Fake Images Shared on Twitter During Disasters”

Leave a reply to Andrew Cancel reply

Patrick Meier, PhD

Table of Contents

Automatically Identifying Fake Images Shared on Twitter During Disasters

Share this:

Related

15 responses to “Automatically Identifying Fake Images Shared on Twitter During Disasters”

Leave a reply to Andrew Cancel reply

Patrick Meier, PhD

Table of Contents