Tag Archives: Digital

Digital Humanitarians and The Theory of Crowd Capital

An iRevolution reader very kindly pointed me to this excellent conceptual study: “The Theory of Crowd Capital”. The authors’ observations and insights resonate with me deeply given my experience in crowdsourcing digital humanitarian response. Over two years ago, I published this blog post in which I wrote that, “The value of Crisis Mapping may at times have less to do with the actual map and more with the conversations and new collaborative networks catalyzed by launching a Crisis Mapping project. Indeed, this in part explains why the Standby Volunteer Task Force (SBTF) exists in the first place.” I was not very familiar with the concept of social capital at the time, but that’s precisely what I was describing. I’ve since written extensively about the very important role that social capital plays in disaster resilience and digital humanitarian response. But I hadn’t taken the obvious next step: “Crowd Capital.”

Screen Shot 2013-03-30 at 4.34.09 PM

John Prpić and Prashant Shukla, the authors of “The Theory of Crowd Capital,” find inspiration in F. A. Hayek, “who in 1945 wrote a seminal work titled: The Use of Knowledge in Society. In this work, Hayek describes dispersed knowledge as:

“The knowledge of the circumstances of which we must make use never exists in concentrated or integrated form but solely as the dispersed bits of incomplete and frequently contradictory knowledge which all the separate individuals possess. […] Every individual has some advantage over all others because he possesses unique information of which beneficial use might be made, but of which use can be made only if the decisions depending on it are left to him or are made with his active cooperation.”

“Crowd Capability,” according to John and Prashant, “is what enables an organization to tap this dispersed knowledge from individuals. More formally, they define Crowd Capability as an “organizational level capability that is defined by the structure, content, and process of an organizations engagement with the dispersed knowledge of individuals—the Crowd.” From their perspective, “it is this engagement of dispersed knowledge through Crowd Capability efforts that endows organizations with data, information, and knowledge previously unavailable to them; and the internal processing of this, in turn, results in the generation of Crowd Capital within the organization.”

In other words, “when an organization defines the structure, content, and processes of its engagement with the dispersed knowledge of individuals, it has created a Crowd Capability, which in turn, serves to generate Crowd Capital.” And so, the authors contend, a Crowd Capable organization “puts in place the structure, content, and processes to access Hayek’s dispersed knowledge from individuals, each of whom has some informational advantage over the other, and thus forming a Crowd for the organization.” Note that a crowd can “exist inside of an organization, exist external to the organization, or a combination of the latter and the former.”

Screen Shot 2013-03-30 at 4.30.05 PM

The “Structure” component of Crowd Capability connotes “the geographical divisions and functional units within an organization, and the technological means that they employ to engage a Crowd population for the organization.” The structure component of Crowd Capability is always an Information-Systems-mediated phenomenon. The “Content” of Crowd Capability constitutes “the knowledge, information or data goals that the organization seeks from the population,” while the “Processes” of Crowd Capability are defined as “the internal procedures that the organization will use to organize, filter, and integrate the incoming knowledge, information, and/or data.” The authors observe that in each Crowd Capital case they’ve analyzed , “an organization creates the structure, content, and/or process to engage the knowledge of dispersed individuals through Information Systems.”

Like the other forms of capital, “Crowd Capital requires investments (for example in Crowd Capability), and potentially pays literal or figurative dividends, and hence, is endowed with typical ‘capital-like’ qualities.” But the authors are meticulous when they distinguish Crowd Capital from Intellectual Capital, Human Capital, Social Capital, Political Capital, etc. The main distinguishing factor is that Crowd Capability is strictly an Information-Systems-mediated phenomenon. “This is not to say that Crowd Capability could not be leveraged to create Social Capital for an organization. It likely could, however, Crowd Capability does not require Social Capital to function.”

That said, I would opine that Crowd Capability can function better thanks to Social Capital. Indeed, Social Capital can influence the “structure”, “content” and “processes” integral to Crowd Capability. And so, while the authors argue that  “Crowd Capital can be accrued without such relationship and network concerns” that are typical to Social Capital, I would counter that the presence of Social Capital certainly does not take away Crowd Capability but quite on the contrary builds greater capability. Otherwise, Crowd Capability is little else than the cultivation of cognitive surplus in which crowd workers can never unite. The Matrix comes to mind. So this is where my experience in crowdsourcing digital humanitarian response makes me diverge from the authors’ conceptualization of “Crowd Capital.” Take the Blue Pill to stay in the disenfranchised version of Crowd Capital; or take the Red Pill if you want to build the social capital required to hack the system.

MatrixBluePillRedPill

To be sure, the authors of Crowd Capital Theory point to Google’s ReCaptcha system for book digitization to demonstrate that Crowd Capability does not require a network of relationships for the accrual of Crowd Capital.” While I understand the return on investment to society both in the form of less spam and more digitized books, this mediated information system is authoritarian. One does not have a choice but to comply, unless you’re a hacker, perhaps. This is why I share Jonathan Zittrain’s point about “The future of the Internet and How To Stop It.” Zittrain promotes the notion of a “Generative Technologies,” which he defines as having the ability “to produce unprompted, user-driven change.”

Krisztina Holly makes a related argument in her piece on crowdscaling. “Like crowdsourcing, crowdscaling taps into the energy of people around the world that want to contribute. But while crowdsourcing pulls in ideas and content from outside the organization, crowdscaling grows and scales its impact outward by empowering the success of others.” Crowdscaling is possible when Crowd Capa-bility generates Crowd Capital by the crowd, for the crowd. In contrast, said crowd cannot hack or change a ReCaptcha requirement if they wish to proceed to the page they’re looking for. In The Matrix, Crowd Capital accrues most directly to The Matrix rather than to the human cocoons being farmed for their metrics. In the same vein, Crowd Capital generated by ReCaptcha accrues most directly to Google Inc. In short, ReCaptcha doesn’t even ask the question: “Blue Pill or Red Pill?” So is it only a matter of time until the users that generate the Crowd Capital unite and revolt, as seems to be the case with the lawsuit against CrowdFlower?

I realize that the authors may have intended to take the conversation on Crowd Capital in a different direction. But they do conclude with a number of inter-esting, open-ended questions that suggest various “flavors” of Crowd Capital are possible, and not just the dark one I’ve just described. I for one will absolutely make use of the term Crowd Capital, but will flavor it based on my experience with digital humanitarias, which suggests a different formula: Social Capital + Social Media + Crowdsourcing = Crowd Capital. In short, I choose the Red Pill.

bio

Summary: Digital Disaster Response to Philippine Typhoon

Update: How the UN Used Social Media in Response to Typhoon Pablo

The United Nations Office for the Coordination of Humanitarian Affairs (OCHA) activated the Digital Humanitarian Network (DHN) on December 5th at 3pm Geneva time (9am New York). The activation request? To collect all relevant tweets about Typhoon Pablo posted on December 4th and 5th; identify pictures and videos of damage/flooding shared in those tweets; geo-locate, time-stamp and categorize this content. The UN requested that this database be shared with them by 5am Geneva time the following day. As per DHN protocol, the activation request was reviewed within an hour. The UN was informed that the request had been granted and that the DHN was formally activated at 4pm Geneva.

pablo_impact

The DHN is composed of several members who form Solution Teams when the network is activated. The purpose of Digital Humanitarians is to support humanitarian organizations in their disaster response efforts around the world. Given the nature of the UN’s request, both the Standby Volunteer Task Force (SBTF) and Humanity Road (HR) joined the Solution Team. HR focused on analyzing all tweets posted December 4th while the SBTF worked on tweets posted December 5th. Over 20,000 tweets were analyzed. As HR will have a blog post describing their efforts shortly (please check here), I will focus on the SBTF.

Geofeedia Pablo

The Task Force first used Geofeedia to identify all relevant pictures/videos that were already geo-tagged by users. About a dozen were identified in this manner. Meanwhile, the SBTF partnered with the Qatar Foundation Computing Research Institute’s (QCRI) Crisis Computing Team to collect all tweets posted on December 5th with the hashtags endorsed by the Philippine Government. QCRI ran algorithms on the dataset to remove (1) all retweets and (2) all tweets without links (URLs). Given the very short turn-around time requested by the UN, the SBTF & QCRI Teams elected to take a two-pronged approach in the hopes that one, at least, would be successful.

The first approach used  Crowdflower (CF), introduced here. Workers on Crowd-flower were asked to check each Tweet’s URL and determine whether it linked to a picture or video. The purpose was to filter out URLs that linked to news articles. CF workers were also asked to assess whether the tweets (or pictures/videos) provided sufficient geographic information for them to be mapped. This methodology worked for about 2/3 of all the tweets in the database. A review of lessons learned and how to use Crowdflower for disaster response will be posted in the future.

Pybossa Philippines

The second approach was made possible thanks to a partnership with PyBossa, a free, open-source crowdsourcing and micro-tasking platform. This effort is described here in more detail. While we are still reviewing the results of this approach, we expect that  this tool will become the standard for future activations of the Digital Humanitarian Network. I will thus continue working closely with the PyBossa team to set up a standby PyBossa platform ready-for-use at a moment’s notice so that Digital Humanitarians can be fully prepared for the next activation.

Now for the results of the activation. Within 10 hours, over 20,000 tweets were analyzed using a mix of methodologies. By 4.30am Geneva time, the combined efforts of HR and the SBTF resulted in a database of 138 highly annotated tweets. The following meta-data was collected for each tweet:

  • Media Type (Photo or Video)
  • Type of Damage (e.g., large-scale housing damage)
  • Analysis of Damage (e.g., 5 houses flooded, 1 damaged roof)
  • GPS coordinates (latitude/longitude)
  • Province
  • Region
  • Date
  • Link to Photo or Video

The vast majority of curated tweets had latitude and longitude coordinates. One SBTF volunteer (“Mapster”) created this map below to plot the data collected. Another Mapster created a similar map, which is available here.

Pablo Crisis Map Twitter Multimedia

The completed database was shared with UN OCHA at 4.55am Geneva time. Our humanitarian colleagues are now in the process of analyzing the data collected and writing up a final report, which they will share with OCHA Philippines today by 5pm Geneva time.

Needless to say, we all learned a lot thanks to the deployment of the Digital Humanitarian Network in the Philippines. This was the first time we were activated to carry out a task of this type. We are now actively reviewing our combined efforts with the concerted aim of streamlining our workflows and methodologies to make this type effort far easier and quicker to complete in the future. If you have suggestions and/or technologies that could facilitate this kind of digital humanitarian work, then please do get in touch either by posting your ideas in the comments section below or by sending me an email.

Lastly, but definitely most importantly, a big HUGE thanks to everyone who volunteered their time to support the UN’s disaster response efforts in the Philippines at such short notice! We want to publicly recognize everyone who came to the rescue, so here’s a list of volunteers who contributed their time (more to be added!). Without you, there would be no database to share with the UN, no learning, no innovating and no demonstration that digital volunteers can and do make a difference. Thank you for caring. Thank you for daring.

Help Tag Tweets from Typhoon Pablo to Support UN Disaster Response!

Update: Summary of digital humanitarian response efforts available here.

The United Nations Office for the Coordination of Humanitarian Affairs (OCHA) has just activated the Digital Humanitarian Network (DHN) to request support in response to Typhoo Pablo. They also need your help! Read on!

pablopic

The UN has asked for pictures and videos of the damage to be collected from tweets posted over the past 48 hours. These pictures/videos need to be geo-tagged if at all possible, and time-stamped. The Standby Volunteer Task Force (SBTF) and Humanity Road (HR), both members of Digital Humanitarians, are thus collaborating to provide the UN with the requested data, which needs to be submitted by today 10pm 11pm New York time, 5am Geneva time tomorrow. Given this very short turn around time, we only have 10 hours (!), the Digital Humani-tarian Network needs your help!

Pybossa Philippines

The SBTF has partnered with colleagues at PyBossa to launch this very useful microtasking platform for you to assist the UN in these efforts. No prior experience necessary. Click here or on the display above to see just how easy it is to support the disaster relief operations on the ground.

A very big thanks to Daniel Lombraña González from PyBossa for turning this around at such short notice! If you have any questions about this project or with respect to volunteering, please feel free to add a comment to this blog post below. Even if you only have time tag one tweet, it counts! Please help!

Some background information on this project is available here.

PeopleBrowsr: Next-Generation Social Media Analysis for Humanitarian Response?

As noted in this blog post on “Data Philanthropy for Humanitarian Response,” members of the Digital Humanitarian Network (DHNetwork) are still using manual methods for media monitoring. When the United Nations Office for the Coordination of Humanitarian Affairs (OCHA) activated the Standby Volunteer Task Force (SBTF) to crisis map Libya last year, for example, SBTF volunteers manually monitored hundreds of Twitter handles, news sites for several weeks.

SBTF volunteers (Mapsters) do not have access to a smart microtasking platform that could have distributed the task in more efficient ways. Nor do they have access to even semi-automated tools for content monitoring and information retrieval. Instead, they used a Google Spreadsheet to list the sources they were manually monitoring and turned this spreadsheet into a sign-up sheet where each Mapster could sign on for 3-hour shifts every day. The SBTF is basically doing “crowd computing” using the equivalent of a typewriter.

Meanwhile, companies like Crimson Hexagon, NetBase, RecordedFuture and several others have each developed sophisticated ways to monitor social and/or mainstream media for various private sector applications such as monitoring brand perception. So my colleague Nazila kindly introduced me to her colleagues at PeopleBrowsr after reading my post on Data Philanthropy. Last week, Marc from PeopleBrowsr gave me a thorough tour of the platform. I was definitely impressed and am excited that Marc wants us to pilot the platform in support of the Digital Humanitarian Network. So what’s the big deal about PeopleBrowsr? To begin with, the platform has access to 1,000 days of social media data and over 3 terabytes of social data per month.

To put this in terms of information velocity, PeopleBrowsr receives 10,000 social media posts per second from a variety of sources including Twitter, Facebook, fora and blogs. On the latter, they monitor posts from over 40 million blogs including all of Tumblr, Posterious, Blogspot and every WordPress-hosted site. They also pull in content from YouTube and Flickr. (Click on the screenshots below to magnify them).

Lets search for the term “tsunami” on Twitter. (One could enter a complex query, e.g., and/or, not, etc., and also search using twitter handles, word or hashtag clouds, top URLs, videos, pictures, etc). PeopleBrowsr summarizes the result by Location and Community. Location simply refers to where those generating content referring to a tsunami are located. Of course, many Twitter users may tweet about an event without actually being eye-witness accounts (think of Diaspora groups, for example). While PeopleBrowsr doesn’t geo-tag the location of reports events, you can very easily and quickly identify which twitter users are tweeting the most about a given event and where they are located.

As for Community, PeopleBrowsr has  indexed millions of social media users and clustered them into different communities based on their profile/bio information. Given our interest in humanitarian response, we could create our own community of social media users from the humanitarian sector and limit our search to those users only. Communities can also be created based on hashtags. The result of the “tsunami” search is displayed below.

This result can be filtered further by gender, sentiment, number of twitter followers, urgent words (e.g., alert, help, asap), time period and location, for example. The platform can monitor and view posts in any language that is posted. In addition, PeopleBrowsr have their very own Kred score which quantifies the “credibility” of social media users. The scoring metrics for Kred scores is completely transparent and also community driven. “Kred is a transparent way to measure influence and outreach in social media. Kred generates unique scores for every domain of expertise. Regardless of follower count, a person is influential if their community is actively listening and engaging with their content.”

Using Kred, PeopleBrows can do influence analysis using Twitter across all languages. They’ve also added Facebook to Kred, but only as an opt in option.  PeopleBrowsr also has some great built-in and interactive data analytics tools. In addition, one can download a situation report in PDF and print that off if there’s a need to go offline.

What appeals to me the most is perhaps the full “drill-down” functionality of PeopleBrowsr’s data analytics tools. For example, I can drill down to the number of tweets per month that reference the word “tsunami” and drill down further per week and per day.

Moreover, I can sort through the individual tweets themselves based on specific filters and even access the underlying tweets complete with twitter handles, time-stamps, Kred scores, etc.

This latter feature would make it possible for the SBTF to copy & paste and map individual tweets on a live crisis map. In fact, the underlying data can be downloaded into a CSV file and added to a Google Spreadsheet for Mapsters to curate. Hopefully the Ushahidi team will also provide an option to upload CSVs to SwiftRiver so users can curate/filter pre-existing datasets as well as content generated live. What if you don’t have time to get on PeopleBrowsr and filter, download, etc? As part of their customer support, PeopleBrowsr will simply provide the data to you directly.

So what’s next? Marc and I are taking the following steps: Schedule online demo of PeopleBrowsr of the SBTF Core Team (they are for now the only members of the Digital Humanitarian Network with a dedicated and experienced Media Monitoring Team); SBTF pilots PeopleBrowsr for preparedness purposes; SBTF deploys  PeopleBrowsr during 2-3 official activations of the Digital Humanitarian Network; SBTF analyzes the added value of PeopleBrowsr for humanitarian response and provides expert feedback to PeopleBrowsr on how to improve the tool for humanitarian response.

State of the Art in Digital Disease Detection

Larry Brilliant’s TED Talk back in 2006 played an important role in catalyzing my own personal interest in humanitarian technology. Larry spoke about the use of natural language processing and computational linguistics for the early detection and early response to epidemics. So it was with tremendous honor and deep gratitude that I delivered the first keynote presentation at Harvard University’s Digital Disease Detection (DDD) conference earlier this year.

The field of digital disease detection has remained way ahead of the curve since 2006 in terms of leveraging natural language processing, computational linguistics and now crowdsourcing for the purposes of early detection of critical events. I thus highly, highly recommend watching the videos of the DDD Ignite Talks and panel presentations, which are all available here. Topics include “Participatory Surveillance,” “Monitoring Rumors,” “Twitter and Disease Detection,” “Search Query Surveillance,” “Open Source Surveillance,” “Mobile Disease Detection,” etc. The presentation on BioCaster is also well worth watching. I blogged about BioCaster here over three years ago and the platform is as impressive as ever.

These public health experts are really operating at the cutting-edge and their insights are proving important to the broader humanitarian technology community. To be sure, the potential added value of cross-fertilization between fields is tremendous. Just take this example of a public health data mining platform (HealthMap) being used by Syrian activists to detect evidence of killings and human rights violations.

The KoBo Platform: Data Collection for Real Practitioners

Update: be sure to check out the excellent points in the comments section below.

I recently visited my alma mater, the Harvard Humanitarian Initiative (HHI), where I learned more about the free and open source KoBo ToolBox project that my colleagues Phuong Pham, Patrick Vinck and John Etherton have been working on. What really attracts me about KoBo, which means transfer in Acholi, is that the entire initiative is driven by highly experienced and respec-ted practitioners. Often, software developers are the ones who build these types of platforms in the hopes that they add value to the work of practitioners. In the case of KoBo, a team of seasoned practitioners are fully in the drivers seat. The result is a highly dedicated, customized and relevant solution.

Phuong and Patrick first piloted handheld digital data collection in 2007 in Northern Uganda. This early experience informed the development of KoBo which continues to be driven by actual field-based needs and challenges such as limited technical know-how. In short, KoBo provides an integrated suite of applications for handheld data collection that are specifically designed for a non-technical audience, ie., the vast majority of human rights and humanitarian practitioners out there. This suite of applications enable users to collect and analyze field data in virtually real-time.

KoBoForm allows you to build multimedia surveys for data collection purposes, integrating special datatypes like bar-codes, images and audio. Time stamps and geo-location via GPS let you know exactly where and when the data was collected (important for monitoring and evaluation, for example). KoBoForm’s optional data constraints and skip logic further ensure data accuracy. KoBoCollect is an Android-based app based on ODK. Surveys built with KoBoForm are easily uploaded to any number of Android phones sporting the KoBoCollect app, which can also be used offline and automatically synched when back in range. KoBoSync pushes survey data from the Android(s) to your computer for data analysis while KoBoMap lets you display your results in an interactive map with a user-friendly interface. Importantly, KoBoMap is optimized for low-bandwidth connections.

The KoBo platform has been used in to conduct large scale population studies in places like the Central African Republic, Northern Uganda and Liberia. In total, Phuong and Patrick have interviewed more than 25,000 individuals in these countries using KoBo, so the tool has certainly been tried and tested. The resulting data, by the way, is available via this data-visualization portal. The team is  currently building new features for KoBo to apply the tool in the Democratic Republic of the Congo (DRC). They are also collaborating with UNDP to develop a judicial monitoring project in the DRC using KoBoToolbox, which will help them “think through some of the requirements for longitudinal data collection and tracking of cases.”

In sum, the expert team behind KoBo is building these software solutions first and foremost for their own field work. As Patrick notes here, “the use of these tools was instrumental to the success of many of our projects.” This makes all the difference vis-a-vis the resulting technology.

Behind the Scenes: The Digital Operations Center of the American Red Cross

The Digital Operations Center at the American Red Cross is an important and exciting development. I recently sat down with Wendy Harman to learn more about the initiative and to exchange some lessons learned in this new world of digital  humanitarians. One common challenge in emergency response is scaling. The American Red Cross cannot be everywhere at the same time—and that includes being on social media. More than 4,000 tweets reference the Red Cross on an average day, a figure that skyrockets during disasters. And when crises strike, so does Big Data. The Digital Operations Center is one response to this scaling challenge.

Sponsored by Dell, the Center uses customized software produced by Radian 6 to monitor and analyze social media in real-time. The Center itself sits three people who have access to six customized screens that relate relevant information drawn from various social media channels. The first screen below depicts some of key topical areas that the Red Cross monitors, e.g., references to the American Red Cross, Storms in 2012, and Delivery Services.

Circle sizes in the first screen depict the volume of references related to that topic area. The color coding (red, green and beige) relates to sentiment analysis (beige being neutral). The dashboard with the “speed dials” right underneath the first screen provides more details on the sentiment analysis.

Lets take a closer look at the circles from the first screen. The dots “orbiting” the central icon relate to the categories of key words that the Radian 6 platform parses. You can click on these orbiting dots to “drill down” and view the individual key words that make up that specific category. This circles screen gets updated in near real-time and draws on data from Twitter, Facebook, YouTube, Flickr and blogs. (Note that the distance between the orbiting dots and the center does not represent anything).

An operations center would of course not be complete without a map, so the Red Cross uses two screens to visualize different data on two heat maps. The one below depicts references made on social media platforms vis-a-vis storms that have occurred during the past 3 days.

The screen below the map highlights the bio’s of 50 individual twitter users who have made references to the storms. All this data gets generated from the “Engagement Console” pictured below. The purpose of this web-based tool, which looks a lot like Tweetdeck, is to enable the Red Cross to customize the specific types of information they’re looking form, and to respond accordingly.

Lets look at the Consul more closely. In the Workflow section on the left, users decide what types of tags they’re looking for and can also filter by priority level. They can also specify the type of sentiment they’re looking, e.g., negative feelings vis-a-vis a particular issue. In addition, they can take certain actions in response to each information item. For example, they can reply to a tweet, a Facebook status update, or a blog post; and they can do this directly from the engagement consul. Based on the license that the Red Cross users, up to 25 of their team members can access the Consul and collaborate in real-time when processing the various tweets and Facebook updates.

The Consul also allows users to create customized timelines, charts and wordl graphics to better understand trends changing over time in the social media space. To fully leverage this social media monitoring platform, Wendy and team are also launching a digital volunteers program. The goal is for these volunteers to eventually become the prime users of the Radian platform and to filter the bulk of relevant information in the social media space. This would considerably lighten the load for existing staff. In other words, the volunteer program would help the American Red Cross scale in the social media world we live in.

Wendy plans to set up a dedicated 2-hour training for individuals who want to volunteer online in support of the Digital Operations Center. These trainings will be carried out via Webex and will also be available to existing Red Cross staff.


As  argued in this previous blog post, the launch of this Digital Operations Center is further evidence that the humanitarian space is ready for innovation and that some technology companies are starting to think about how their solutions might be applied for humanitarian purposes. Indeed, it was Dell that first approached the Red Cross with an expressed interest in contributing to the organization’s efforts in disaster response. The initiative also demonstrates that combining automated natural language processing solutions with a digital volunteer net-work seems to be a winning strategy, at least for now.

After listening to Wendy describe the various tools she and her colleagues use as part of the Operations Center, I began to wonder whether these types of tools will eventually become free and easy enough for one person to be her very own operations center. I suppose only time will tell. Until then, I look forward to following the Center’s progress and hope it inspires other emergency response organizations to adopt similar solutions.

On Rumors, Repression and Digital Disruption in China: Opening Pandora’s Inbox of Truthiness?

The Economist recently published a brilliant piece on China entitled: “The Power of Microblogs: Zombie Followers and Fake Re-Tweets.” BBC News followed with an equally excellent article: “Damaging Coup Rumors Ricochet Across China.” Combined, these articles reveal just how profound the digital disruption in China is likely to be now that Pandora’s Inbox has been opened.

Credit: The Economist

The Economist article opens with an insightful historical comparison:

“In the year 15AD, during the short-lived Xin dynasty, a rumor spread that a yellow dragon, a symbol of the emperor, had inauspiciously crashed into a temple in the mountains of central China and died. Ten thousand people rushed to the site. The emperor Wang Mang, aggrieved by such seditious gossip, ordered arrests and interrogations to quash the rumor, but never found the source. He was dethroned and killed eight years later, and Han-dynasty rule was restored.”

“The next ruler, Emperor Guangwu, took a different approach, studying rumors as a barometer of public sentiment, according to a recent book Rumors in the Han Dynasty by Lu Zongli, a historian. Guangwu’s government compiled a ‘Rumors Report’, cataloguing people’s complaints about local officials, and making assessments that were passed to the emperor. The early Eastern Han dynasty became known for officials who were less corrupt and more attuned to the people.”

In present day China, a popular pastime among 250+ million Chinese users of microblogging platforms is to “spread news and rumors, both true and false, that challenge the official script of government officials and state-propaganda organs.” In Domination and the Arts of Resistance: Hidden Transcripts, James Scott distinguishes between public and hidden transcripts. The former describes the open, public discourse that take place between dominators and oppressed while hidden transcripts relate to the critique of power that “goes on offstage”, which the power elites cannot decode. Scott writes that when the oppressed classes publicize this “hidden transcript”, (the truthiness?) they become con-scious of its common status. Borrowing from Juergen Habermas (as interpreted by Clay Shirky), those who take on the tools of open expression become a public, and a synchronized public increasingly constrains undemocratic rulers while ex-panding the rights of that public. The result in China? “It is hard to overestimate how much the arrival of [microblogging platforms] has changed the dynamic between rulers and ruled over the past two years” (The Economist).

Chinese authorities have responded to this threat in two predictable ways, one repeating the ill-fated actions of the Xin Dynasty and the other reflecting the more open spirit of Emperor Guangwu. In the latter case, authorities are turning to microblogs as a “listening post” for public opinion and also as a publishing platform. Indeed, “government agencies, party organs and individual officials have set up more than 50,000 weibo accounts [Chinese equivalent of Twitter]” (The Economist). In the former case, the regime has sought to “combat rumors harshly and to tighten controls over the microblogs and their users, censoring posts and closely monitoring troublemakers.” The UK Guardian reports that China is now “taking the toughest steps yet against major microblogs and detain-ing six people for spreading rumors of a coup amid Beijing’s most serious political crisis for years.”

Beijing’s attempt to regulate microblogging companies by requiring users to sign up with their real names is unlikely to be decisive, however. “No matter how it is enforced, user verification seems unlikely to deter the spread of rumors and information that has so concerned authorities” (The Economist). To be sure, companies are already selling fake verification services for a small fee. Besides, verifying accounts for millions of users is simply too time-consuming and hence costly. Even Twitter gave up their verified account service a while back. The task of countering rumors is even more of a Quixotic dream.

Property tycoon Zhang Xin, who has more than 3 million followers, wrote: “What is the best way to stop ‘rumors’? It is transparency and openness. The more speech is discouraged, the more rumors there will be” (UK Guardian).

This may in part explains why Chinese authorities have shifted their approach to one of engagement as evidenced by those 50,000 new weibo accounts. With this second reaction, however, Beijing is possibly passing the point of no return. “This degree of online engagement can be awkward for authorities used to a comfortable buffer from public opinion,” writes The Economist. This is an understatement; Pandora’s (In)box is now open and the “hidden transcript” is cloaked no longer. The critique of power is decoded and elites are “forced” to devise a public reply as a result of this shared awareness lest they lose legitimacy vis-a-vis the broader population. But the regime doesn’t even have a “customer service” mechanism in place to deal with distributed and potentially high-volume complaints. Censorship is easy compared to engagement.

Recall the “Rumors Report” compiled by Emperor Guangwu’s government to catalogue people’s complaints about local officials. How will these 50,000 new weibo users deal with such complaints now that the report can be crowdsourced, especially given that fact that China’s “Internet users have become increasingly bold in their willingness to discuss current affairs and even sensitive political news […]” (UK Guardian).

As I have argued in my dissertation, repressive regimes can react to real (or perceived)  threats posed by “liberation technologies” by either cracking down and further centralizing control and/or by taking on the same strategies as digital activists, which at times requires less centralization. Either way, they’re taking the first step on a slippery slope. By acknowledging the problem of rumors so publicly, the regime is actually calling more attention to how disruptive these simple speculations can be—the classic Streisand effect.

“By falsely packaging lies and speculation as ‘truth’ and ‘existence’, online rumours undermine the morale of the public, and, if out of control, they will seriously disturb the public order and affect social stability,” said a commentary in the People’s Daily, the official Communist party newspaper. (UK Guardian).

Practically speaking, how will those 50,000 new weibo users coordinate their efforts to counter rumors and spread state propaganda? “We have a saying among us: you only need to move your lips to start a rumor, but you need to run until your legs are broken to refute one,” says an employee of a state media outlet (The Economist). How will these new weibo users synchronize collective action in near real-time to counter rumors when any delay is likely to be interpreted as evidence of further guilt? Will they know how to respond to myriads of questions being bombarded at them in real-time by hundreds of thousands of Chinese microbloggers? This may lead to high-pressure situations that are rife for mistakes and errors, particularly if these government officials are new to microblogging. Indeed, If just one of these state-microbloggers slips, that slip could go viral with a retweet tsunami. Any retreat by authorities from this distributed engagement strategy will only lead to more rumors.

The rumors of the coup d’état continue to ricochet across China, gaining remarkable traction far and wide. Chinese microblogs were also alight last week with talk of corruption and power struggles within the highest ranks of the party, which may have fueled the rumor of an overthrow. This is damaging to China’s Communist Party which “likes to portray itself as unified and in control,” particularly as it prepares for it’s once-in-a-decade leadership shuffle. “The problem for China’s Communist Party is that it has no effective way of refuting such talk. There are no official spokesmen who will go on the record, no sources briefing the media on the background. Did it happen? Nobody knows. So the rumors swirl” (BBC News). Even the official media, which is “often found waiting for political guidance, can be slow and unresponsive.”

So if Chinese authorities and state media aren’t even equipped (beyond plain old censorship) to respond to national rumors of vis-a-vis an event as important as a coup (can it possibly get more important than that?), then how in the world will they deal with the undercurrent of rumors that continue to fill Chinese microblogs now that these can have 50,000 new targets online? Moreover, “many in China are now so cynical about the level of censorship that they will not believe what comes from the party’s mouthpieces even if it is true. Instead they will give credence to half-truths or fabrications on the web,” which is “corrosive for the party’s authority” (BBC News). This is a serious problem for China’s Communist elite who are obsessed with the task of projecting an image of total unity and stability.

In contrast, speculators on Chinese microblogging platforms don’t need a highly coordinated strategy to spread conspiracies. They are not handicapped by the centralization and collective action problem that Chinese authorities face; after all, it is clearly far easier to spread a rumor than to debunk one. As noted by The Economist, those spreading rumors have “at their disposal armies of zombie followers and fake re-tweets as well as marketing companies, which help draw attention to rumors until they are spread by a respected user with many real followers, such as a celebrity.” But there’s more at stake here than mere rumors. In fact, as noted by The Economist, the core of the problem has less to do with hunting down rumors of yellow dragons than with “the truth that they reflect: a nervous public. In the age of weibo, it may be that the wisps of truth prove more problematic for authorities than the clouds of falsehood.”

Fascinating epilogues:

China’s censorship can never defeat the internet
China’s censors tested by microbloggers who keep one step ahead of state media

Truthiness as Probability: Moving Beyond the True or False Dichotomy when Verifying Social Media

I asked the following question at the Berkman Center’s recent Symposium on Truthiness in Digital Media: “Should we think of truthiness in terms of probabili-ties rather than use a True or False dichotomy?” The wording here is important. The word “truthiness” already suggests a subjective fuzziness around the term. Expressing truthiness as probabilities provides more contextual information than does a binary true or false answer.

When we set out to design the SwiftRiver platform some three years ago, it was already clear to me then that the veracity of crowdsourced information ought to be scored in terms of probabilities. For example, what is the probability that the content of a Tweet referring to the Russian elections is actually true? Why use probabilities? Because it is particularly challenging to instantaneously verify crowdsourced information in the real-time social media world we live in.

There is a common tendency to assume that all unverified information is false until proven otherwise. This is too simplistic, however. We need a fuzzy logic approach to truthiness:

“In contrast with traditional logic theory, where binary sets have two-valued logic: true or false, fuzzy logic variables may have a truth value that ranges in degree between 0 and 1. Fuzzy logic has been extended to handle the concept of partial truth, where the truth value may range between completely true and completely false.”

The majority of user-generated content is unverified at time of birth. (Does said data deserve the “original sin” of being labeled as false, unworthy, until prove otherwise? To digress further, unverified content could be said to have a distinct wave function that enables said data to be both true and false until observed. The act of observation starts the collapse of said wave function. To the astute observer, yes, I’m riffing off Shroedinger’s Cat, and was also pondering how to weave in Heisenberg’s uncertainty principle as an analogy; think of a piece of information characterized by a “probability cloud” of truthiness).

I believe the hard sciences have much to offer in this respect. Why don’t we have error margins for truthiness? Why not take a weather forecast approach to information truthiness in social media? What if we had a truthiness forecast understanding full well that weather forecasts are not always correct? The fact that a 70% chance of rain is forecasted doesn’t prevent us from acting and using that forecast to inform our decision-making. If we applied binary logic to weather forecasts, we’d be left with either a 100% chance of rain or 100% chance of sun. Such weather forecasts would be at best suspect if not wrong rather frequently.

In any case, instead of dismissing content generated in real-time because it is not immediately verifiable, we can draw on Information Forensics to begin assessing the potential validity of said content. Tactics from information forensics can help us create a score card of heuristics to express truthiness in terms of probabilities. (I call this advanced media literacy). There are indeed several factors that one can weigh, e.g., the identity of the messenger relaying the content, the source of the content, the wording of said content, the time of day the information was shared, the geographical proximity of the source to the event being reported, etc.

These weights need not be static as they are largely subjective and temporal; after all, truth is socially constructed and dynamic. So while a “wisdom of the crowds” approach alone may not always be well-suited to generating these weights, perhaps integrating the hunch of the expert coupled with machine learning algorithms (based on lessons learned in information forensics) could result more useful decision-support tools for truthiness forecasting (or rather “backcasting”).

In sum, thinking of truthiness strictly in terms of true and false prevents us from “complexifying” a scalar variable into a vector (a wave function), which in turn limits our ability to develop new intervention strategies. We need new conceptual frameworks to reflect the complexity and ambiguity of user-generated content:

 

Crowdsourcing Satellite Imagery Analysis for Somalia: Results of Trial Run

We’ve just completed our very first trial run of the Standby Task Volunteer Force (SBTF) Satellite Team. As mentioned in this blog post last week, the UN approached us a couple weeks ago to explore whether basic satellite imagery analysis for Somalia could be crowdsourced using a distributed mechanical turk approach. I had actually floated the idea in this blog post during the floods in Pakistan a year earlier. In any case, a colleague at Digital Globe (DG) read my post on Somalia and said: “Lets do it.”

So I reached out to Luke Barrington at Tomnod to set up distributed micro-tasking platform for Somalia. To learn more about Tomond’s neat technology, see this previous blog post. Within just a few days we had high resolution satellite imagery from DG and a dedicated crowdsourcing platform for imagery analysis, courtesy of Tomnod . All that was missing were some willing and able “mapsters” from the SBTF to tag the location of shelters in this imagery. So I sent out an email to the group and some 50 mapsters signed up within 48 hours. We ran our pilot from August 26th to August 30th. The idea here was to see what would go wrong (and right!) and thus learn as much as we could before doing this for real in the coming weeks.

It is worth emphasizing that the purpose of this trial run (and entire exercise) is not to replicate the kind of advanced and highly-skilled satellite imagery analysis that professionals already carry out.  This is not just about Somalia over the next few weeks and months. This is about Libya, Syria, Yemen, Afghanistan, Iraq, Pakistan, North Korea, Zimbabwe, Burma, etc. Professional satellite imagery experts who have plenty of time to volunteer their skills are far and few between. Meanwhile, a staggering amount of new satellite imagery is produced  every day; millions of square kilometers’ worth according to one knowledgeable colleague.

This is a big data problem that needs mass human intervention until the software can catch up. Moreover, crowdsourcing has proven to be a workable solution in many other projects and sectors. The “crowd” can indeed scan vast volumes of satellite imagery data and tag features of interest. A number of these crowds-ourcing platforms also have built-in quality assurance mechanisms that take into account the reliability of the taggers and tags. Tomnod’s CrowdRank algorithm, for example, only validates imagery analysis if a certain number of users have tagged the same image in exactly the same way. In our case, only shelters that get tagged identically by three SBTF mapsters get their locations sent to experts for review. The point here is not to replace the experts but to take some of the easier (but time-consuming) tasks off their shoulders so they can focus on applying their skill set to the harder stuff vis-a-vis imagery interpretation and analysis.

The purpose of this initial trial run was simply to give SBTF mapsters the chance to test drive the Tomnod platform and to provide feeback both on the technology and the work flows we put together. They were asked to tag a specific type of shelter in the imagery they received via the web-based Tomnod platform:

There’s much that we would do differently in the future but that was exactly the point of the trial run. We had hoped to receive a “crash course” in satellite imagery analysis from the Satellite Sentinel Project (SSP) team but our colleagues had hardly slept in days because of some very important analysis they were doing on the Sudan. So we did the best we could on our own. We do have several satellite imagery experts on the SBTF team though, so their input throughout the process was very helpful.

Our entire work flow along with comments and feedback on the trial run is available in this open and editable Google Doc. You’ll note the pages (and pages) of comments, questions and answers. This is gold and the entire point of the trial run. We definitely welcome additional feedback on our approach from anyone with experience in satellite imagery interpretation and analysis.

The result? SBTF mapsters analyzed a whopping 3,700+ individual images and tagged more than 9,400 shelters in the green-shaded area below. Known as the “Afgooye corridor,” this area marks the road between Mogadishu and Afgooye which, due to displacement from war and famine in the past year, has become one of the largest urban areas in Somalia. [Note, all screen shots come from Tomnod].

Last year, UNHCR used “satellite imaging both to estimate how many people are living there, and to give the corridor a concrete reality. The images of the camps have led the UN’s refugee agency to estimate that the number of people living in the Afgooye Corridor is a staggering 410,000. Previous estimates, in September 2009, had put the number at 366,000” (1).

The yellow rectangles depict the 3,700+ individual images that SBTF volunteers individually analyzed for shelters: And here’s the output of 3 days’ worth of shelter tagging, 9,400+ tags:

Thanks to Tomnod’s CrowdRank algorithm, we were able to analyze consensus between mapsters and pull out the triangulated shelter locations. In total, we get 1,423 confirmed locations for the types of shelters described in our work flows. A first cursory glance at a handful (“random sample”) of these confirmed locations indicate they are spot on. As a next step, we could crowdsource (or SBTF-source, rather) the analysis of just these 1,423 images to triple check consensus. Incidentally, these 1,423 locations could easily be added to Google Earth or a password-protected Ushahidi map.

We’ve learned a lot during this trial run and Luke got really good feedback on how to improve their platform moving forward. The data collected should also help us provide targeted feedback to SBTF mapsters in the coming days so they can further refine their skills. On my end, I should have been a lot more specific and detailed on exactly what types of shelters qualified for tagging. As the Q&A section on the Google Doc shows, many mapsters weren’t exactly sure at first because my original guidelines were simply too vague. So moving forward, it’s clear that we’ll need a far more detailed “code book” with many more examples of the features to look for along with features that do not qualify. A colleague of mine suggested that we set up an interactive, online quiz that takes volunteers through a series of examples of what to tag and not to tag. Only when a volunteer answers all questions correctly do they move on to live tagging. I have no doubt whatsoever that this would significantly increase consensus in subsequent imagery analysis.

Please note: the analysis carried out in this trial run is not for humanitarian organizations or to improve situational awareness, it is simply for testing purposes only. The point was to try something new and in the process work out the kinks so when the UN is ready to provide us with official dedicated tasks we don’t have to scramble and climb the steep learning curve there and then.

In related news, the Humanitarian Open Street Map Team (HOT) provided SBTF mapsters with an introductory course on the OSM platform this past weekend. The HOT team has been working hard since the response to Haiti to develop an OSM Tasking Server that would allow them to micro-task the tracing of satellite imagery. They demo’d the platform to me last week and I’m very excited about this new tool in the OSM ecosystem. As soon as the system is ready for prime time, I’ll get access to the backend again and will write up a blog post specifically on the Tasking Server.