“Predicting Information Credibility in Time-Sensitive Social Media” is one of this year’s most interesting and important studies on “information forensics”. The analysis, co-authored by my QCRI colleague ChaTo Castello, will be published in Internet Research and should be required reading for anyone interested in the role of social media for emergency management and humanitarian response. The authors study disaster tweets and find that there are measurable differences in the way they propagate. They show that “these differences are related to the news-worthiness and credibility of the information conveyed,” a finding that en-abled them to develop an automatic and remarkably accurate way to identify credible information on Twitter.
The new study builds on this previous research, which analyzed the veracity of tweets during a major disaster. The research found “a correlation between how information propagates and the credibility that is given by the social network to it. Indeed, the reflection of real-time events on social media reveals propagation patterns that surprisingly has less variability the greater a news value is.” The graphs below depict this information propagation behavior during the 2010 Chile Earthquake.
The graphs depict the re-tweet activity during the first hours following earth-quake. Grey edges depict past retweets. Some of the re-tweet graphs reveal interesting patterns even within 30-minutes of the quake. “In some cases tweet propagation takes the form of a tree. This is the case of direct quoting of infor-mation. In other cases the propagation graph presents cycles, which indicates that the information is being commented and replied, as well as passed on.” When studying false rumor propagation, the analysis reveals that “false rumors tend to be questioned much more than confirmed truths […].”
Building on these insights, the authors studied over 200,000 disaster tweets and identified 16 features that best separate credible and non-credible tweets. For example, users who spread credible tweets tend to have more followers. In addition, “credible tweets tend to include references to URLs which are included on the top-10,000 most visited domains on the Web. In general, credible tweets tend to include more URLs, and are longer than non credible tweets.” Further-more, credible tweets also tend to express negative feelings whilst non-credible tweets concentrate more on positive sentiments. Finally, question- and exclama-tion-marks tend to be associated with non-credible tweets, as are tweets that use first and third person pronouns. All 16 features are listed below.
• Average number of tweets posted by authors of the tweets on the topic in past.
• Average number of followees of authors posting these tweets.
• Fraction of tweets having a positive sentiment.
• Fraction of tweets having a negative sentiment.
• Fraction of tweets containing a URL that contain most frequent URL.
• Fraction of tweets containing a URL.
• Fraction of URLs pointing to a domain among top 10,000 most visited ones.
• Fraction of tweets containing a user mention.
• Average length of the tweets.
• Fraction of tweets containing a question mark.
• Fraction of tweets containing an exclamation mark.
• Fraction of tweets containing a question or an exclamation mark.
• Fraction of tweets containing a “smiling” emoticons.
• Fraction of tweets containing a first-person pronoun.
• Fraction of tweets containing a third-person pronoun.
• Maximum depth of the propagation trees.
Using natural language processing (NLP) and machine learning (ML), the authors used the insights above to develop an automatic classifier for finding credible English-language tweets. This classifier had a 86% AUC. This measure, which ranges from 0 to 1, captures the classifier’s predictive quality. When applied to Spanish-language tweets, the classifier’s AUC was still relatively high at 82%, which demonstrates the robustness of the approach.
Interested in learning more about “information forensics”? See this link and the articles below:
- Automatically Ranking Credibility of Tweets During Major Events
- Six Degrees of Separation: Implications for Verifying Social Media
- How to Verify Crowdsourced Information from Social Media
- Truth in Age of Social Media: Social Computing & Big Data Challenge
- Truthiness as Probability: Moving Beyond the True or False Dichotomy when Verifying Social Media
- How to Verify and Counter Rumors in Social Media
- Crowdsourcing for Human Rights Monitoring: Challenges and Opportunities for Information Collection & Verification
- Rapidly Verifying the Credibility of Sources on Twitter
- Accelerating the Verification of Social Media Content
Pingback: Estudio analiza características de los tweets creíbles sobre desastres | iRescate
Pingback: Cyberculture roundup: “Good and bad reasons to be worried about WCIT…”A Tutorial on Anonymous Email Accounts « Erkan's Field Diary
Retweet pattern is occasion of expectation .
Pingback: Could Artificial Intelligence Debunk Twitter Rumors Before They Spread? - Slate Magazine - TWITTEROO.NET
Pingback: Analyzing the Veracity of Tweets during a Major Crisis | iRevolution
Pingback: Forscher entwickeln Lügendetektor für Twitter | BASIC thinking
Pingback: Блог Imena.UA » Учёные нашли способ определить правдивость твитов
Pingback: 5 motivos por los que un Twitter resulta creíble | Redes Sociales
Pingback: 5 motivos por los que un tweet resulta creíble « Blog Personal de Ariel Infante
Pingback: 5 motivos por los que un tweet resulta creíble | Construweb Social Media Marketing
Pingback: Social media hoaxes: Could machine learning debunk false Twitter rumors before they spread? « GEODATA POLICY
Pingback: Tweet analyzer ranks trustworthy Tweets during emergencies | Tim Batchelder.com
Pingback: How Twitter Gets In The Way Of Research « Gadgetizing
Pingback: New Site Coming In Days
Pingback: Tweeting is Believing? Analyzing Perceptions of Credibility on Twitter | iRevolution
Pingback: » A (não) memória do Twitter
Pingback: How to Create Resilience Through Big Data | iRevolution
Pingback: Why the Public Uses Social Media During Disasters (and Why Some Don’t) | iRevolution
Pingback: Policy makers and network science: Time to bridge the divide « Voices from Eurasia
Pingback: Building a Better Truth Machine | ForensicsPress.com
Pingback: За Россию, Путина и Народный Фронт! » Новые медиа как средство международных информационных интервенций
Pingback: How Twitter Gets In The Way Of Research | TheseNews.com
Pingback: A Research Framework for Next Generation Humanitarian Technology and Innovation | iRevolution
Pingback: Using Crowdsourcing to Counter the Spread of False Rumors on Social Media During Crises | iRevolution
Pingback: Cyberculture roundup: “The Life and Times of a TV Show Piracy Release Group”, CISPA behind closed doors… | Erkan's Field Diary
Pingback: Humanitarianism in the Network Age: Groundbreaking Study | iRevolution
Pingback: Tweet analyzer ranks trustworthy Tweets during emergencies | Tim Batchelder.com
Pingback: Automatically Identifying Fake Images Shared on Twitter During Disasters | iRevolution
Pingback: World Disaster Report: Next Generation Humanitarian Technology | iRevolution
Pingback: Analyzing Fake Content on Twitter During Boston Marathon Bombings | iRevolution
Pingback: Учёные нашли способ определить правдивость твитов | Блог Imena.UA
Pingback: New Insights on How To Verify Social Media | iRevolution
Pingback: Automatically Ranking the Credibility of Tweets During Major Events | iRevolution
Pingback: Got TweetCred? Use it To Automatically Identify Credible Tweets | iRevolution
Pingback: How Twitter Gets In The Way Of Knowledge | NUIPAS. AN ENTERTAINING PLACE
Pingback: How Twitter Gets In The Way Of Knowledge | OMG It Went Viral