Monthly Archives: March 2012

Trails of Trustworthiness in Real-Time Streams

Real-time information channels like Twitter, Facebook and Google have created cascades of information that are becoming increasingly challenging to navigate. “Smart-filters” alone are not the solution since they won’t necessarily help us determine the quality and trustworthiness of the information we receive. I’ve been studying this challenge ever since the idea behind SwiftRiver first emerged several years ago now.

I was thus thrilled to come across a short paper on “Trails of Trustworthiness in Real-Time Streams” which describes a start-up project that aims to provide users with a “system that can maintain trails of trustworthiness propagated through real-time information channels,” which will “enable its educated users to evaluate its provenance, its credibility and the independence of the multiple sources that may provide this information.” The authors, Panagiotis Metaxas and Eni Mustafaraj, kindly cite my paper on “Information Forensics” and also reference SwiftRiver in their conclusion.

The paper argues that studying the tactics that propagandists employ in real life can provide insights and even predict the tricks employed by Web spammers.

“To prove the strength of this relationship between propagandistic and spamming techniques, […] we show that one can, in fact, use anti-propagandistic techniques to discover Web spamming networks. In particular, we demonstrate that when starting from an initial untrustworthy site, backwards propagation of distrust (looking at the graph defined by links pointing to to an untrustworthy site) is a successful approach to finding clusters of spamming, untrustworthy sites. This approach was inspired by the social behavior associated with distrust: in society, recognition of an untrustworthy entity (person, institution, idea, etc) is reason to question the trust- worthiness of those who recommend it. Other entities that are found to strongly support untrustworthy entities become less trustworthy themselves. As in society, distrust is also propagated backwards on the Web graph.”

The authors document that today’s Web spammers are using increasingly sophisticated tricks.

“In cases where there are high stakes, Web spammers’ influence may have important consequences for a whole country. For example, in the 2006 Congressional elections, activists using Google bombs orchestrated an effort to game search engines so that they present information in the search results that was unfavorable to 50 targeted candidates. While this was an operation conducted in the open, spammers prefer to work in secrecy so that their actions are not revealed. So,  revealed and documented the first Twitter bomb, which tried to influence the Massachusetts special elections, show- ing how an Iowa-based political group, hiding its affiliation and profile, was able to serve misinformation a day before the election to more than 60,000 Twitter users that were follow- ing the elections. Very recently we saw an increase in political cybersquatting, a phenomenon we reported in [28]. And even more recently, […] we discovered the existence of Pre-fabricated Twitter factories, an effort to provide collaborators pre-compiled tweets that will attack members of the Media while avoiding detection of automatic spam algorithms from Twitter.

The theoretical foundations for a trustworthiness system:

“Our concept of trustworthiness comes from the epistemology of knowledge. When we believe that some piece of information is trustworthy (e.g., true, or mostly true), we do so for intrinsic and/or extrinsic reasons. Intrinsic reasons are those that we acknowledge because they agree with our own prior experience or belief. Extrinsic reasons are those that we accept because we trust the conveyor of the information. If we have limited information about the conveyor of information, we look for a combination of independent sources that may support the information we receive (e.g., we employ “triangulation” of the information paths). In the design of our system we aim to automatize as much as possible the process of determining the reasons that support the information we receive.”

“We define as trustworthy, information that is deemed reliable enough (i.e., with some probability) to justify action by the receiver in the future. In other words, trustworthiness is observable through actions.”

“The overall trustworthiness of the information we receive is determined by a linear combination of (a) the reputation RZ of the original sender Z, (b) the credibility we associate with the contents of the message itself C(m), and (c) characteristics of the path that the message used to reach us.”

“To compute the trustworthiness of each message from scratch is clearly a huge task. But the research that has been done so far justifies optimism in creating a semi-automatic, personalized tool that will help its users make sense of the information they receive. Clearly, no such system exists right now, but components of our system do exist in some of the popular [real-time information channels]. For a testing and evaluation of our system we plan to use primarily Twitter, but also real-time Google results and Facebook.”

In order to provide trails of trustworthiness in real-time streams, the authors plan to address the following challenges:

•  “Establishment of new metrics that will help evaluate the trustworthiness of information people receive, especially from real-time sources, which may demand immediate attention and action. […] we show that coverage of a wider range of opinions, along with independence of results’ provenance, can enhance the quality of organic search results. We plan to extend this work in the area of real-time information so that it does not rely on post-processing procedures that evaluate quality, but on real-time algorithms that maintain a trail of trustworthiness for every piece of information the user receives.”

• “Monitor the evolving ways in which information reaches users, in particular citizens near election time.”

•  “Establish a personalizable model that captures the parameters involved in the determination of trustworthiness of in- formation in real-time information channels, such as Twitter, extending the work of measuring quality in more static information channels, and by applying machine learning and data mining algorithms. To implement this task, we will design online algorithms that support the determination of quality via the maintenance of trails of trustworthiness that each piece of information carries with it, either explicitly or implicitly. Of particular importance, is that these algorithms should help maintain privacy for the user’s trusting network.”

• “Design algorithms that can detect attacks on [real-time information channels]. For example we can automatically detect bursts of activity re- lated to a subject, source, or non-independent sources. We have already made progress in this area. Recently, we advised and provided data to a group of researchers at Indiana University to help them implement “truthy”, a site that monitors bursty activity on Twitter.  We plan to advance, fine-tune and automate this process. In particular, we will develop algorithms that calculate the trust in an information trail based on a score that is affected by the influence and trustworthiness of the informants.”

In conclusion, the authors “mention that in a month from this writing, Ushahidi […] plans to release SwiftRiver, a platform that ‘enables the filtering and verification of real-time data from channels like Twitter, SMS, Email and RSS feeds’. Several of the features of Swift River seem similar to what we propose, though a major difference appears to be that our design is personalization at the individual user level.”

Indeed, having been involved in SwiftRiver research since early 2009 and currently testing the private beta, there are important similarities and some differences. But one such difference is not personalization. Indeed, Swift allows full personalization at the individual user level.

Another is that we’re hoping to go beyond just text-based information with Swift, i.e., we hope to pull in pictures and video footage (in addition to Tweets, RSS feeds, email, SMS, etc) in order to cross-validate information across media, which we expect will make the falsification of crowdsourced information more challenging, as I argue here. In any case, I very much hope that the system being developed by the authors will be free and open source so that integration might be possible.

A copy of the paper is available here (PDF). I hope to meet the authors at the Berkman Center’s “Truth in Digital Media Symposium” and highly recommend the wiki they’ve put together with additional resources. I’ve added the majority of my research on verification of crowdsourced information to that wiki, such as my 20-page study on “Information Forensics: Five Case Studies on How to Verify Crowdsourced Information from Social Media.”

Cyclones in Cyberspace? How Crowdsourced Cyber Warfare Shaped the Russian-Georgia War

“Cyclones in Cyberspace: Information Shaping and Denial in the 2008 Russia-Georgia War” was just published in Security Dialogue, a respected peer-reviewed journal. The article analyzes “the impact of cyberspace on the conflict between Russia and Georgia over the disputed territory of South Ossetia in August 2008.” The authors Ron Diebert, Rafal Rohozinski and Masashi Crete-Nishihata argue that “cyberspace played a significant, if not decisive, role in the conflict–as an object of contestation and as a vector for generating strategic effects and outcomes.”

The purpose of this blog post is to briefly highlight some important insights from the study by sharing a few key excerpts from the study.

Introduction

“Cyberspace is now explicitly recognized in United States strategic doc-trine as being equally as important as land, air, sea, and space […]. Dozens of states are actively developing military doctrines for cyberspace operations (Hughes, 2010), while others may be employing unconventional cyberspace strategies. An arms race in cyberspace looms on the horizon (Deibert and Rohozinski, 2011).”

“The US Department of Defense (2010: 86) presently defines cyber- space as ‘a global domain within the information environment consisting of the interdependent network of information technology infrastructures, including the Internet, telecommunications net- works, computer systems, and embedded processors and controllers’. This definition acknowledges the interdependence between the physical and informational realm. It also defines cyberspace as the totality of information infrastructures, which includes but is not limited to the Internet. The constitutive elements of cyberspace can be broken down into four levels: physical infrastructure, the code level, the regulatory level, and the level of ideas. These constitutive elements of cyberspace were all present and leveraged during the 2008 conflict between Russia and Georgia.”

“Operations in and through cyber- space were present throughout the conflict and were leveraged by civilian and military actors on both sides. Russian and Georgian forces made use of information operations alongside their con-ventional military capabilities. Civilian leadership on both sides clearly appreciated the importance of strategic communication, and targeted domestic and international media in order to narrate the intent and desired outcome of the conflict.”

“The Internet played an important role as a redistribution channel for media and communications, including news, influential blogs, and rumors. The impact of this media was so effective in the eyes of the Georgian authorities that they decided to censor Russian television broadcasts in major Georgian cities, and to filter access to Russian Internet sites.”

Information Denial

“Both sides (or their sympathizers) employed computer network operations, consisting of attacks designed to disable or degrade key infrastructure, and exploitation or hijacking of government computer systems. In particular, numerous Georgian websites and a few Russian media sites were subject to large-scale distributed-denial-of-service (DDoS) events. The command-and-control (C&C) servers responsible for the DDoS against Georgian systems and websites, as well as other forms of malicious hacking, originated from networks located within the Russian Federation.”

“The Russian government has never claimed responsibility for these activities, and it remains unclear whether these operations were coordina-ted, encouraged, or officially tolerated by Russian authorities. This ambiguity is itself an important emergent property of war fighting in the cyber domain.”

“The DDoS surge and SQL injection-based intrusions against Georgian systems beginning on 8 August were later followed by a series of crowd-sourced DDoS activities targeting Georgian government websites and resources, coordinated on Russian hacker forums. It is unclear whether these activities were sanctioned and organized as a component of a broader political strategy, whether they occurred as a result of informal coordination by the Kremlin’s communications staff and its networks of contacts with the Russian IT community (which includes quasi-criminal groups), or whether they occurred as a result of autonomous third-party actions.”

“In an attempt to mitigate the effects of the DDoS events, Georgian authorities sought assistance from the governments of Estonia, Lithuania, and Poland. Reportedly, Estonian officials put Georgia in contact with a community of cyber-security professionals who provided consultations (Stiennon, 2008). Georgia attempted to counter the effectiveness of the DDoS surge by implementing filters to block the Russian IP addresses and protocols used by the attackers. This effort was successfully countered, and the DDoS surge shifted to foreign servers and software to mask the IP addresses (Bumgarner and Borg, 2009). Georgia’s next step was to mirror several government websites, including that of Georgia’s president, on servers located in the countries that came to its assistance, which conse-quently also became the target of Russian DDoS events.”

“US cyberspace was also affected, as components of the Georgian government such as the Ministry of Foreign Affairs were shifted to Blogspot and the websites of the president and the Ministry of Defense were moved to servers operated by operated by Tulip Systems (TSHost), a private web- hosting company based in Atlanta, Georgia (Swabey, 2008; Svensson, 2008a). The Georgian expatriate CEO of TSHost contacted Georgian officials and offered the company’s services without notifying US authori-ties. Soon after the Georgian websites were transferred to TSHost, the US-based servers were subject to DDoS. The CEO of TSHost reported these attacks to the FBI, but the company never received US government sanction for migrating the websites (Svensson, 2008b). Moving hosting to US-based TSHost raised the issue of whether the USA had violated its cyber neutrality by permitting Georgia to use its information services during the conflict.”

Deliberate or Emergent?

One of the study’s principle research questions is whether the Russian campaign in cyberspace was deliberate and planned. The authors consider there possible scenarios: (1) the actions were deliberate and planned; (2) the actions were ‘encouraged’ or ‘passively encouraged’ by state agents; or (3) the actions were an unpredictable result and dynamic emergent property of cyberspace itself.  The resulting evaluation of each scenario’s probability suggests that “Russian citizens, criminal groups, and hackers independently organized and/or participated in a self-directed cyber riot against Georgia out of patriotic sentiments.”

“Civilians have voluntarily engaged in warfare activities without the approval or direction of states throughout the history of armed conflict. What makes the actions of civilians in cyberspace different are the characteristics of the domain, where effects can be generated with ease and at rapid speed. Quite simply, collective action is easier and faster in cyberspace than it is in any other physical domain. If this scenario was the case during the Russia–Georgia war, it would signal the emergence of a new factor in cyberspace operations – the capacity for a group other than the belligerents to generate significant effects in and through cyberspace. The unpredictable nature of such outside participation–global in scope, random in distribution–can lead to chaotic outcomes, much like the trajectory and phase of a cyclone.”

Conclusion

“There was leverage gained in the conflict by the pursuit of information denial. Even in environments where the communication environment is constrained, societies are heavily dependent on cyberspace and feel its strategic importance most acutely by its absence. Information-denial strategies are more closely associated with countries of Asia, the Middle East, North Africa, and the CIS–as opposed to the West, which is more comfortable with information projection. Information denial also tends to fit more comfortably within semi-authoritarian or competitive authoritarian countries than democratic ones.”

“The tendencies toward information denial also challenge some of the widespread assumptions about the relationships between new information and communication technologies and conflict. In recent years, a conven-tional wisdom has emerged that links cyber- space with a high degree of transparency around modern wars. Our research suggests that the opposite is more likely to be the case as states and non-state actors aggressively pursue military objectives to shape, control, and suppress the realm of ideas.”

“The tendency toward privateering is very strong in cyber conflict. There is already a large and growing illicit global computer-crime market. This market is attractive to some states because it allows them to execute their missions once removed and clandestinely, thus offering plausible deniability and avoiding responsibilities under international law or the laws of armed conflict. Outsourcing to private actors in cyberspace is an example of what we have elsewhere called ‘next- generation cyberspace controls’ (Deibert and Rohozinski, 2010c). Although we found no direct evidence of cyber-privateering in open sources in this case, it is certainly a possibility. Indeed, some countries may actively cultivate cyber-privateering as a strategy precisely to confuse the battle space and muddy attribution.”

“[…] the scope and scale of contingent effects related to the character of the cyberspace domain present a qualitative difference for international con-flicts. An emergent property related to today’s global information and communications environment, inherent in its complexity, dynamism, and dispersed character, is for acts of cyber warfare to be highly unpredictable and volatile.”

“Although states may plan or ‘seed’ campaigns in cyberspace, such campaigns have a tendency to take on lives of their own because of the unavoidable participation of actors swarming from edge locations (see Der Derian, 1996). We refer to this dynamic as ‘cyclones in cyberspace’ – a phenomenon clearly evident in the August 2008 conflict both in terms of the piling-on of outside participants and the confusion and panic sown in Georgia by its own filtering choices.”

“Cyclones in cyberspace invariably internationalize any cyber conflict. […] As cyberspace penetrates those regions of the world where conflict and instability are ripe and authoritarian regimes prevail, the propensity for more cyclones in cyberspace is high and should concern international security researchers and policymakers.”

For more on cyber war, please see my earlier bog post on “Cyberconflict and Global Politics: New Media, War, Digital Activism.”