Update: I have authored a 20+ page paper on verifying social media content based on 5 case studies. Please see this blog post for a copy.
I get this question all the time: “How do you verify social media data?” This question drives many of the conversations on crowdsourcing and crisis mapping these days. It’s high time that we start compiling our tips and tricks into an online how-to-guide so that we don’t have to start from square one every time the question comes up. We need to build and accumulate our shared knowledge in information forensics. So here is the Google Doc version of this blog post, please feel free to add your best practices and ask others to contribute. Feel free to also add links to other studies on verifying social media content.
If every source we monitored in the social media space was known and trusted, then the need for verification would not be as pronounced. In other words, it is the plethora and virtual anonymity of sources that makes us skeptical of the content they deliver. The process of verifying social media data thus requires a two-step process: the authentication of the source as reliable and the triangulation of the content as valid. If we can authenticate the source and find it trustworthy, this may be sufficient to trust the content and mark is a verified depending on context. If source authentication is difficult to ascertain, then we need to triangulate the content itself.
Lets unpack these two processes—authentication and triangulation—and apply them to Twitter since the most pressing challenges regarding social media verification have to do with eyewitness, user-generated content. The first step is to try and determine whether the source is trustworthy. Here are some tips on how to do this:
- Bio on Twitter: Does the source provide a name, picture, bio and any links to their own blog, identity, professional occupation, etc., on their page? If there’s a name, does searching for this name on Google provide any further clues to the person’s identity? Perhaps a Facebook page, a professional email address, a LinkedIn profile?
- Number of Tweets: Is this a new Twitter handle with only a few tweets? If so, this makes authentication more difficult. Arasmus notes that “the more recent, the less reliable and the more likely it is to be an account intended to spread disinformation.” In general, the longer the Twitter handle has been around and the more Tweets linked to this handle, the better. This gives a digital trace, a history of prior evidence that can be scrutinized for evidence of political bias, misinformation, etc. Arasmus specifies: “What are the tweets like? Does the person qualify his/her reports? Are they intelligible? Is the person given to exaggeration and inconsistencies?”
- Number of followers: Does the source have a large following? If there are only a few, are any of the followers know and credible sources? Also, how many lists has this Twitter hanlde been added to?
- Number following: How many Twitter users does the Twitter handle follow? Are these known and credible sources?
- Retweets: What type of content does the Twitter handle retweet? Does the Twitter handle in question get retweeted by known and credible sources?
- Location: Can the source’s geographic location be ascertained? If so, are they nearby the unfolding events? One way to try and find out by proxy is to examine during which periods of the day/night the source tweets the most. This may provide an indication as to the person’s time zone.
- Timing: Does the source appear to be tweeting in near real-time? Or are there considerable delays? Does anything appear unusual about the timing of the person’s tweets?
- Social authentication: If you’re still unsure about the source’s reliability, use your own social network–Twitter, Facebook, LinkedIn–to find out if anyone in your network know about the source’s reliability.
- Media authentication: Is the source quoted by trusted media outlines whether this be in the mainstream or social media space?
- Engage the source: Tweet them back and ask them for further information. NPR’s Andy Carvin has employed this technique particularly well. For example, you can tweet back and ask for the source of the report and for any available pictures, videos, etc. Place the burden of proof on the source.
These are some of the tips that come to mind for source authentication. For more thoughts on this process, see my previous blog post “Passing the I’m-Not-Gaddafi-Test: Authenticating Identity During Crisis Mapping Operations.” If you some tips of your own not listed here, please do add them to the Google Doc—they don’t need to be limited to Twitter either.
Now, lets say that we’ve gone through list above and find the evidence inconclusive. We thus move to try and triangulate the content. Here are some tips on how to do this:
- Triangulation: Are other sources on Twitter or elsewhere reporting on the event you are investigating? As Arasmus notes, “remain skeptical about the reports that you receive. Look for multiple reports from different unconnected sources.” The more independent witnesses you can get information from the better and the less critical the need for identity authentication.
- Origins: If the user reporting an event is not necessarily the original source, can the original source be identified and authenticated? In particular, if the original source is found, does the time/date of the original report make sense given the situation?
- Social authentication: Ask members of your own social network whether the tweet you are investigating is being reported by other sources. Ask them how unusual the event reporting is to get a sense of how likely it is to have happened in the first place. Andy Carvin’s followers, for example, “help him translate, triangulate, and track down key information. They enable remarkable acts of crowdsourced verification […] but he must always tell himself to check and challenge what he is told.”
- Language: Andy Carvin notes that tweets that sound too official, using official language like “breaking news”, “urgent”, “confirmed” etc. need to be scrutinized. “When he sees these terms used, Carvin often replies and asks for additional details, for pictures and video. Or he will quote the tweet and add a simple one word question to the front of the message: Source?” The BBC’s UGC (user-generated content) Hub in London also verifies whether the vocabulary, slang, accents are correct for the location that a source might claim to be reporting from.
- Pictures: If the twitter handle shares photographic “evidence”, does the photo provide any clues about the location where it was taken based on buildings, signs, cars, etc., in the background? The BBC’s UGC Hub checks weaponry against those know for the given country and also looks for shadows to determine the possible time of day that a picture was taken. In addition, they examine weather reports to “confirm that the conditions shown fit with the claimed date and time.” These same tips can be applied to Tweets that share video footage.
- Follow up: If you have contacts in the geographic area of interest, then you could ask them to follow up directly/in-person to confirm the validity of the report. Obviously this is not always possible, particularly in conflict zones. Still, there is increasing anecdotal evidence that this strategy is being used by various media organizations and human rights groups. One particularly striking example comes from Kyrgyzstan where a Skype group with hundreds of users across the country were able disprove and counter rumors at a breathtaking pace. See this blog post for more details. See my blog post on “How to Use Technology to Counter Rumors During Crises: Anecdotes from Kyrgyzstan.”
These are just a handful of tips and tricks come to mind. The number of bullet points above clearly shows we are not completely powerless when verifying social media data. There are several strategies available. The main challenge, as the BBC points out, is that this type of information forensics “can take anything from seconds […] to hours, as we hunt for clues and confirmation.” See for example my earlier post on “The Crowdsourcing Detective: Crisis, Deception and Intrigue in the Twitterspehere” which highlights some challenges but also new opportunities.
One of Storyful‘s comparative strengths when it comes to real-time news curation is the growing list of authenticated users it follows. This represents more of a bounded (but certainly not static) approach. As noted in my previous blog post on “Seeking the Trustworthy Tweet,” following a bounded model presents some obvious advantages. This explains by the BBC recommends “maintaining lists of previously verified material [and sources] to act as a reference for colleagues covering the stories.” This strategy is also employed by the Verification Team of the Standby Volunteer Task Force (SBTF).
In sum, I still stand by my earlier blog post entitled “Wag the Dog: How Falsifying Crowdsourced Data can be a Pain.” I also continue to stand by my opinion that some data–even if not immediately verifiable—is better than no data. Also, it’s important to recognize that we have in some occasions seen social media prove to be self-correcting, as I blogged about here. Finally, we know that information is often perishable in times of crises. By this I mean that crisis data often has a “use-by date” after which, it no longer matters whether said information is true or not. So speed is often vital. This is why semi-automated platforms like SwiftRiver that aim to filter and triangulate social media content can be helpful.
Hey Patrick – Great synopsis! I’ll add to the google doc, but also here… looking at the content of all tweets, as you mention, for signs of bias can also be done by taking inventory or the diversity of tweet history. Is this a one-issue tweeter? If so, this is likely an account created to further a particular information agenda. Most of us have themes running through our interests, but a tweet history that is completely consumed with one or two issues without the occasional personal reference, humor or reference to Lady Gaga is suspect. Behind every tweeter should be a personality.
Thanks Jess, looking forward to compiling all this for the SBTF. Re one-issue tweeter, yes, that’s exactly the kind of investigation I was doing re the Gaddafi-Test. It’s the long shadow, the digital trace that can give people away.
Great tips, Patrick – although I’d push back a little on your statement ‘it is the plethora and virtual anonymity of sources that makes us skeptical of the content they deliver’. In times of uncertainty there may be many reasons why people want to remain anonymous. And in some cases, where anonymity is the only avenue for a report to be exposed. It is certainly true that identity is one helpful way to guide trust but I think we shouldn’t preclude trust in anonymous sources.
Thanks Heather, the anonymity of sources is what adds uncertainty regardless of whether that anonymity is intentional or not.
Yes, you’re right, Patrick 🙂 The end.
Pingback: Dealing with Dirty Data: the Sudan VoteMonitor Project – The Ushahidi Blog
Excellent post Patrick. Incredibly timely for the situation we are living in Mexico, where reliable information can be difficult to obtain through traditional media. Social Networks are playing a more important role and learning to distinguish fact from rumor and gossip is essential.
If you don’t mind I would like to make at least a partial translation to spanish, as this information would be very helpful to many people. You should see the pingback here.
Thanks very much, Angel. By all means feel free to translate and share. Thanks again
Pingback: Como Verificar Noticias y Contenido en los Medios Sociales | El Ornitorrinco en Linea
Please send me the 20 plus version at. MDF@habmalnefrage.de, thanks
Pingback: Answer to Matthew Levinger and TechChange | Diary of a Crisis Mapper
Pingback: Even on social media, attribution is important « Behind the Press
Pingback: Trust and Verify: How I Curate My List of Journalist Arrests « Groundswell
Pingback: How can we verify information by ourselves? | Sshoya's Blog
Pingback: 26 Tips for Managing a Social Media Community | Savvy Media Marketing
Pingback: 26 Tips for Managing a Social Media Community | Network With Joe
Pingback: 26 Tips for Managing a Social Media Community « Buy Facebook Fans,Twitter Followers,YouTube Views,Google Plus Votes
Pingback: 26 Tips for Managing a Social Media Community | Start Right Start Now
Pingback: Traditional vs. Crowdsourced Election Monitoring: Which Has More Impact? | iRevolution
Pingback: 26 Tips for Managing a Social Media Community « YourSocialCenter.com
Pingback: Verifying Social Media Content: The Best Links, Case Studies and Discussion | Groundswell
Pingback: Trova.la Twitter e giornalismo, ecco qualche consiglio » Trova.la
A work mate recommended me to this resource. Thnx for the resources.
Pingback: Article: How to Verify Social Media Content: Some Tips and Tricks on Information Forensics | tamaracisc1597
Pingback: 26 Tips for Managing a Social Media Community « MindCorp | Newsfeed
Pingback: Management - Social Media | 26 Tips for Managing a Social Media Community
Pingback: Cómo Verificar Noticias En Las Redes Sociales. | MetroViral
Pingback: Verifying News found on Social Media | Colleen F. McCormack
Pingback: Verifying Social Media Information | Scott Bruxvoort's Blog
Pingback: Article Review: Verifying News on Social Media | pashayates
Pingback: Global Voices Community Blog » Weekly Writing Tips: Authenticating Social Media Sources
Pingback: Sourcing | braydennuessen
Pingback: The Power of Verification – cameronj20blog
Pingback: From Knowledge Workers to Learning Workers – Personal Learning Environments – Liz Kosinski
This write up has come in very handy I am presently coding my data for my thesis on Assessment on the use of Social Media in news gathering. On the issue of verification and credibility I have had major challenges but I really do appreciate this post as it brings clarity to a clogged field.