# Truthiness as Probability: Moving Beyond the True or False Dichotomy when Verifying Social Media

I asked the following question at the Berkman Center’s recent Symposium on Truthiness in Digital Media: “Should we think of truthiness in terms of probabili-ties rather than use a True or False dichotomy?” The wording here is important. The word “truthiness” already suggests a subjective fuzziness around the term. Expressing truthiness as probabilities provides more contextual information than does a binary true or false answer.

When we set out to design the SwiftRiver platform some three years ago, it was already clear to me then that the veracity of crowdsourced information ought to be scored in terms of probabilities. For example, what is the probability that the content of a Tweet referring to the Russian elections is actually true? Why use probabilities? Because it is particularly challenging to instantaneously verify crowdsourced information in the real-time social media world we live in.

There is a common tendency to assume that all unverified information is false until proven otherwise. This is too simplistic, however. We need a fuzzy logic approach to truthiness:

“In contrast with traditional logic theory, where binary sets have two-valued logic: true or false, fuzzy logic variables may have a truth value that ranges in degree between 0 and 1. Fuzzy logic has been extended to handle the concept of partial truth, where the truth value may range between completely true and completely false.”

The majority of user-generated content is unverified at time of birth. (Does said data deserve the “original sin” of being labeled as false, unworthy, until prove otherwise? To digress further, unverified content could be said to have a distinct wave function that enables said data to be both true and false until observed. The act of observation starts the collapse of said wave function. To the astute observer, yes, I’m riffing off Shroedinger’s Cat, and was also pondering how to weave in Heisenberg’s uncertainty principle as an analogy; think of a piece of information characterized by a “probability cloud” of truthiness).

I believe the hard sciences have much to offer in this respect. Why don’t we have error margins for truthiness? Why not take a weather forecast approach to information truthiness in social media? What if we had a truthiness forecast understanding full well that weather forecasts are not always correct? The fact that a 70% chance of rain is forecasted doesn’t prevent us from acting and using that forecast to inform our decision-making. If we applied binary logic to weather forecasts, we’d be left with either a 100% chance of rain or 100% chance of sun. Such weather forecasts would be at best suspect if not wrong rather frequently.

In any case, instead of dismissing content generated in real-time because it is not immediately verifiable, we can draw on Information Forensics to begin assessing the potential validity of said content. Tactics from information forensics can help us create a score card of heuristics to express truthiness in terms of probabilities. (I call this advanced media literacy). There are indeed several factors that one can weigh, e.g., the identity of the messenger relaying the content, the source of the content, the wording of said content, the time of day the information was shared, the geographical proximity of the source to the event being reported, etc.

These weights need not be static as they are largely subjective and temporal; after all, truth is socially constructed and dynamic. So while a “wisdom of the crowds” approach alone may not always be well-suited to generating these weights, perhaps integrating the hunch of the expert coupled with machine learning algorithms (based on lessons learned in information forensics) could result more useful decision-support tools for truthiness forecasting (or rather “backcasting”).

In sum, thinking of truthiness strictly in terms of true and false prevents us from “complexifying” a scalar variable into a vector (a wave function), which in turn limits our ability to develop new intervention strategies. We need new conceptual frameworks to reflect the complexity and ambiguity of user-generated content:

### 25 responses to “Truthiness as Probability: Moving Beyond the True or False Dichotomy when Verifying Social Media”

1. Reminds me of a trust network. http://en.wikipedia.org/wiki/Web_of_trust

Maybe you could use “bounded crowdsourcing” techniques to bootstrap a web of trust based on social networks- use the networks’ ranking of documents/videos/”truths” to offer different “truthiness” scores to each person (i.e. one document should present a different truthiness score to each person individually based on their embedding in the trust network).

• Very interesting, I like the approach you’re proposing.

2. you will find out how to do what you propose in the following paper and we have some more recent work with a model of a dirty bomb attack on an urban city that will be presented at the ISCRAM.org meeting in Vancouver in april. I will send that to anyone interested. that issue of TFSC is a special issue on the Delphi method which has lots of guidelines for making social networks more intelligent and trying to produce a true collective intelligent process.
Bañuls, Victor, and Murray Turoff, Scenario construction via Delphi and Cross-impact analysis, Technological Forecasting and Social Change Vol 18, nu. 9, Nov, 2011.

• Hi Murray, many thanks for your feedback, really appreciate it. I would indeed be very interested in getting a copy of your ISCRAm paper as wel as your piece on Scenario construction. Any chance you could send these to me via email? patrick at iRevolution dot net?

• murray turoff

give me an email to turoff@njit.edu with your email so i can send an attachment.

3. Hi Patrick,

Big supporter of this approach. The element I would like to pair with the truthiness of a piece of social media would the degree of severity. It is a similar suggestion that I made to Penn State before they did their study into trustworthiness of tweets [but it never made it in] and at one of the sessions at ICCM 2011. If we had a rating of truthiness and of severity of a given element, then it could be easily placed on a matrix and would help responders take (rapid), targeted action. Something that gets rated very trustworthy and very severe would raise an immediate red flag [SMS, email, phone call, display board in a disaster center, etc] for responding agencies to take action is needed. It would provide a great way for entities responding to major emergencies to funnel the data to a suitable degree for themselves [perhaps including additional filters to see only message relevant to them — e.g. Urban Search and Rescue will be looking for very different messages than people delivering food aid]

Cheers,
Andrej

• Many thanks for your informative feedback, Andrej.

4. I definitely agree that we need representation systems for degrees of trust beyond “true” and “false” (or “true”, “false”, and “unverified”) and probability scores are one natural way to do this.

However, the axioms of probability come with certain semantics. First, there is the problem of model calibration. You seem to be proposing a system (part human, part algorithm) which processes certain information and meta-information about a report (author, time, location, content, source history, etc.) to assign a probability score. So far so good. This is a model, intended to predict whether or not a report is true based on limited information. How do we validate the model? In other words, this scheme only has value if we have a way to distinguish systems that produce solid truth probability numbers from systems that produce junk.

The only answer that makes sense to me is to compare the model’s predictions with reality, e.g. “ground truth.” So one could validate on historical data of (eventually) known truth-value, or use further independent data collection. Of course if you validate every report with further information collection then you aren’t using the model at all. So we have to use “discount” methods of model checking, such as collecting further information only on a random sample of reports. The upshot is that we always need some method to provide a check on the accuracy of the assigned probability scores, or we have no idea if they’re worth anything at all.

But there are still at least three major problems in doing that. We’re getting into some pretty subtle stuff, but this is my understanding, in order of where I feel I have the best grasp to the least grasp.

1) models are pretty weak compared to the real world, especially statistical models. Anything based on extrapolating previously observed patterns into the future, without representing the underlying dynamics of a situation, is going to break when the generating process alters. The real world dynamics of many processes are largely unknown, or we’d be able to do things like predicts wars and election results. The starkest example of this comes from financial models, which can be good at e.g. predicting stock market prices right up until the moment that they fail catastrophically.

2) In order to make any truth predictions at all, it’s necessary to categorize events. For example, if our historical data says “55% of reports of looting turned out to be accurate” then “report of looting” is the category. But as you well know, categories get fuzzy around the edges. “Report of looting” is maybe not so hard, but what about events like “vote fraud.” We’re not really going to get reports of vote fraud, we’re going to get reports of suspicious ballot boxes, counts that don’t add up, voters turned away at a particular polling station, etc. These will be different in each election, and trying to clump them into categories in some way that is both sensible AND matches previous historical categorizations — so that previously trained models can be applied — may prove difficult. This is related to the “feature selection” problem in machine learning, but has all the additional difficulties and choices of the digitization process, e.g. WHAT information is collected at all, by whom and how.

3) Probabilities can only really be assigned based on prior expectations. This is a deep problem, which the field of “subjective probability” tackles head on. (see e.g. http://www.princeton.edu/~bayesway/Book*.pdf) This is related to the categorization problem but it’s worse than that. The upshot is that there is no such thing as an “objective” probability measure, only probabilities which follow logically from the (often implicit) probabilities assigned to prior beliefs — in other words, how likely you think various events are, before you hear any reports at all. (Do you consider a country “stable” and therefore expect, a-priori, that a revolution is unlikely? Or do you already believe the political situation to be in flux?) Basically you cannot start with zero assumptions about the world. This seems a subtle point, but it’s a very real issue and comes out immediately as unknown prior probability variables in Bayesian estimation, whose values must be chosen before any estimation algorithm can be run.

All of this is not to say that calculated probabilities are always useless, but bear in mind that no one really knows how to get this sort of probability assignment exactly right. It would be interesting to collect examples of models that work in practice.

Finally, I will note that at least one major intelligence organization does not use probability values to rate the truth of reported statements (and analysts’ derived statements.) Instead they do the following:

– each analyst who looks at a claim rates it on a five point scale, from +2 to -2, meaning “strongly believed to be true” to “strongly believed to be false.”

– the system records the date, analyst name, analyst rating, and any notes or justification that the analyst makes (including hyperlinks to material which supports their claim)

– instead of probability scores, the system shows the previous analyst notations. So the user doesn’t see that a report is “38% true,” but rather, who thought what of the report, when, and why.

(Recording the date of each notation is particularly useful because it helps to determine what information was available when the notation was made, which in turn lets us know which scores might need to be updated when new information arrives.)

Sara Farmer would doubtless immediately note that it is perfectly reasonably to have analyst “bots,” so any desired algorithmic truth-prediction model could be integrated into this framework as just another agent making notations.

I know this has all been rather abstract, but I hope this has been helpful nonetheless, or at least interesting,

– Jonathan

• Wow, thanks for your blog-post-length reply, Jonathan! Great stuff. Agreed re the (mis)use of the word truthiness.

I’ve spent several years working in the field of conflict early warning and conflict forecasting. There several ways to validate such models and indeed using historical data and discount models are common. To respond to you 3 main points:

1) Have a look at DARPA’s Integrated Conflict Early Warning System (ICEWS) and the work of Didier Sornette at the Technical University of Zurich, as well as Bruce Bueno de Mesquita. Also see my previous research on:

The Mathematics of War: On Earthquakes and Conflicts
http://irevolution.net/2011/10/21/mathematics-of-war

Applying Earthquake Physics to Conflict Analysis
http://irevolution.net/2011/10/24/earthquake-physics-for-conflict-analysis

2) There several conflict indicator ontologies around as well as automated event-data coding platforms that can provide consistent coding or categorization of reports. See in particular the Integrated Date Event Analysis (IDEA) framework that Gary King evaluated a few years ago:

3) Probabilities assigned based on prior expectations is not necessarily a weakness of the model. As Rober Kirkpatrick notes, we want to combine the wisdom of the crowds with machine learning and the hunch of the expert. You could perhaps crowdsource such probabilities or use bounded crowdsourcing to get expert opinions. The idea here is to develop weighted ontologies that can be repeatedly tested and hopefully refined.

“- instead of probability scores, the system shows the previous analyst notations. So the user doesn’t see that a report is “38% true,” but rather, who thought what of the report, when, and why.”

Agreed, reminds me of this which I blogged about four years ago:

Intellipedia for Humanitarian Warning/Response
http://irevolution.net/2008/04/09/intellipedia-for-humanitarian-warningresponse

Thanks again, Jonathan!

5. Also… not sure we should be using “truthiness” here. As Colbert used it, it means an intuitive, non-rational judgement, a belief for which we cannot give a reason. Probability estimates may not give us certainty, but (when done right) we can definitely give concrete reasons for why we believe *that* number and no other, and when we have a validated model we can even say how often we expect to be wrong. Far better then “truthiness” I think.

• jmartiniii1968

What I might say is that there are times when we may be interested in the truth value (or probable truth value of a statement, if it is in fact the truth or falsity of the statement which is important- “There is a bomb in the building” might be such a statement. But perhaps more often than not we want to know where along a continuum of accuracy, rather than truthfulness, a statement or report lies. What even this does not tell us is what is our tolerance levels are for inaccuracy, which very likely also depend upon the nature of what is reported.

6. In building dynamic models, such as in emergency management, there is no problem in in stating dynamic elements such as
The probability of truth for statements such as:
The citizens trust the leadership of the city in handling a major disaster.
The value of the probability of truth for this has impact on many of the other events that might occur in modeling a particular type of disaster.
The probability of an event occurring in the disaster is also a measure of truth in whether the event occurs or does not occur. This is what is used in cross impact analysis. The theories behind cross impact is in the Delphi Method Book (1975), free on my website. We have modernized it and the full theory is in a paper in a recent issue of technological forecasting and social change with about 20 papers on current work and utilization of the Delphi method. The ISCRAM paper (Information Systems for Crisis Response and Management) is an example of modeling a dirty bomb attack on an urban area and creating dynamic scenario that can show different outcomes by changing initial probabilities of both physical happenings and human behavior happenings. It would also work well for the merger of companies where it is impossible to do pure business models without treating decision behaviors that would occur after the merger. ( i have had some requests and i wanted to add more detail to my earlier message)
website http://is.njit.edu/turoff http://iscram.org for prior proceedings and meeting in Vancouver in april
There is also related material in “The network Nation: human communication via computer” hiltz and turoff, 1978 reprinted in 1993 by mit press. This is on the collective intelligence aspect of on-line communications that allow real time structures that go far beyond current social networks.

• Thanks for sharing, Murray

• There is a long long discussion here on why in certain cases of subjective probability considerations the probability calculus will not work. The paper i did in 1972 is in the delphi method book and gives examples. I use the fermi-dirac distribution in physics since it describes physical phenomena like excitation states where the resulting state is either 0 or 1 which applies to whether something is true or false or whether a “unique” event will occur or not occur. When you take N such events and statements of “fact” and ask a person for each event or truth statement: Suppose item i is certain to occur or certain to be true (or the opposite). How does that influence the probability of ALL the other items. Each of these questions gives you a new probability space. Hence, (for other reasons as well) I fit the total model to an approximate macro model that requires n(n-1)/n estimates for N events.
The only exact way to this problem is an expansion tree with transition probabilities for every possible sequence of events which for 10 events would require about 10 million estimates. So exact modelling is out of the question Analogues to the many body problem in physics. The 1972 paper gives more details. The 2011 paper in TFSC merges my approach with ISM (Interpretive structural modeling – via Warfield) that allows to create influence diagrams to condense large numbers of events into micro scenarios so one can cluster and reduces the complexity of the model to simplify decisions that might influence the outcome. I will stop now!!!!! Those that want to get into academic discussions/disagreements we could start a separate message group.

7. I have flu, so I’ll leave thinking about this til later. In the meantime, I’ll just note that, although popular, probabilities are just one way of dealing with uncertainty in human and machine reasoning. Off the top of my head, there are: probabilities (in various flavours, including interval probabilities), possibilities (including fuzzy logic), Dempster-Shafer, certainty factors (mostly seen as a bad idea, but very easy to build accidentally), modal logics and n-state logics (I think Patrick started here, somewhere around the 3-state idea of Kleene logics)… let’s just say that it’s quite a long list, and yes, the differences can be quite subtle, even before we start discussing what subjective probabilities really mean, and whether the probabilities that we’re using are normative or not.

Patrick: The bottom line is that, for most things, probabilities work, but you need to be careful how you mix them with human reasoning, and especially careful that when you say “an orange” you mean the fruit orange and not “a small round thing of kinda reddish colour”.

Andrej: You’re talking about risk, which is classically defined as the combination of probability of an event and the severity of its consequences if it did occur. This comes in several different flavours too.

Jonathan: Yep, you’re right. Welcome to my world.

• Thanks Sara

“The bottom line is that, for most things, probabilities work, but you need to be careful how you mix them with human reasoning, and especially careful that when you say “an orange” you mean the fruit orange and not “a small round thing of kinda reddish colour”.

We were already doing this kind of deconfliction at VRA 8 years ago with our automated natural language processing of Reuters newswires. The idea here is simply to build an ontology for this type of deconfliction.

8. jmartiniii1968

Perhaps it is the dichotomous terminology of “true” and “false” which is part of the problem rather than looking at pieces of information existing along a continuum of “very accurate” to “very inaccurate” or some similar scale. For example if I say “I saw five people at the intersection” and there were actually, four, six, one, or people, the truth value of the statement is still the same: false. It remains false regardless of how well or poorly the statement approximates the actual state of affairs. What may be the really important question in this example might actually be “What is the importance of the piece of information/intelligence?” Maybe it does not matter exactly how many were at the intersection, only that there were people there? Or maybe it is important for some reason whether there were exactly five people at the intersection, in which case the probable truth or falsity of the report is of interest.

In a similar vein, it is also possible (and this is just me thinking a hypothesis of the top of my head) that there may be forms of systematic error in reports (or the reporters themselves) that might be discoverable and adjustable. I raise this possibility because in the aftermath of tornadoes, initial reports of damage from responders frequently (but not always—sometimes they underestimate, which to me can be a rule of thumb that things are likely as bad as initially reported-possibly much worse but very likely no better than the early reports) overestimate both total number of homes damaged along with the number of those destroyed. In media and responders I have also noticed often the distinction between “destroyed” and “damaged” gets confused, leading to the total number of homes with any type of damage being reported as having been destroyed.
Of course, if such a quantitative “error of report” (and I use this term because it reminds me of the error of measurement I am familiar with from my days of neuropsych testing) exists and could be determined in some way, it would be a task for those with far more neurons than myself.