Tag Archives: Crowdsourcing

Folksomaps: Gold Standard for Community Mapping

There were a number of mapping-related papers, posters and demo’s at ICTD2009. One paper in particular caught my intention given the topic’s direct relevance to my ongoing consulting work with the UN’s Threat and Risk Mapping Analysis (TRMA) project in the Sudan and the upcoming ecosystem project in Liberia with Ushahidi and Humanity United.

Introduction

Entitled “Folksomaps – Towards Community Intelligent Maps for Developing Regions,” the paper outlines a community-driven approach for creating maps by drawing on “Web 2.0 principles” and “Semantic Web technologies” but without having to rely entirely on a web-based interface. Indeed, Folksomaps “makes use of web and voice applications to provide access to its services.”

I particularly value the authors’ aim to “provide map-based services that represent user’s intuitive way of finding locations and directions in developing regions.” This is an approach that definitely resonates with me. Indeed, it is our responsibility to adapt and customize our community-based mapping tools to meet the needs, habits and symbology of the end user; not the other way around.

I highly recommend this paper (or summary below) to anyone doing work in the crisis mapping field. In fact, I consider it required reading. The paper is co-authored by Arun Kumar, Dipanjan Chakraborty, Himanshu Chauhan, Sheetal Agarwal and Nitendra Rajput of IBM India Research Lab in New Delhi.

Background

Vast rural areas of developing countries do not have detailed maps or mapping tools. Rural populations are generally semi-literate, low-income and non-tech savvy. They are hardly like to have access to neogeography platforms like Google Earth. Moreover, the lack of electricity access and Internet connection also complicates the situation.

We also know that cities, towns and villages in developing countries “typically do not have well structured naming of streets, roads and houses,” which means “key landmarks become very important in specifying locations and directions.”

Drawing on these insights, the authors seek to tap the collective efforts of local communities to populate, maintain and access content for their own benefit—an approach I have described as crowdfeeding.

Surveys of Tech and Non-Tech Users

The study is centered on end-user needs, which is rather refreshing. The authors carried out a series of surveys to be better understand the profiles of end-users, e.g., tech and non-tech users.

The first survey sought to identify answers to the following questions:

  • How do people find out points of interest?
  • How do much people rely on maps versus people on the streets?
  • How do people provide local information to other people?
  • Whether people are interested in consuming and feeding information for a community-driven map system?

The results are listed in the table below:

folksotb1

Non-tech savvy users did not use maps to find information about locations and only 36% of these users required precise information. In addition, 75% of non-tech respondents preferred the choice of a phone-based interface, which really drives home the need for what I have coined “Mobile Crisis Mapping” or MCM.

Tech-users also rely primarily on others (as opposed to maps) for location related information. The authors associate this result with the lack of signboards in countries like India. “Many a times, the maps do not contain fine-grained information in the first place.”

Most tech-users responded that a phone-based location and direction finding system in addition to a web-based interface. Almost 80% expressed interest in “contributing to the service by uploading content either over the phone or though a web-based portal.”

The second survey sought to identify how tech and non-tech users express directions and local information. For example:

  • How do you give directions to people on the road or to friends?
  • How do you describe proximity of a landmark to another one?
  • How do you describe distance? Kilometers or using time-to-travel?

The results are listed in the table below:

folksotb2

The majority of non-tech savvy participants said they make use of landmarks when giving directions. “They use names of big roads […] and use ‘near to’, ‘adjacent to’, ‘opposite to’ relations with respect to visible and popular landmarks […].” Almost 40% of responders said they use time only to describe the distance between any two locations.

Tech-savvy participants almost always use both time and kilometers as a measure to represent distance. Only 10% or so of participants used kilometers only to represent distance.

The Technology

The following characteristics highlight the design choices that differentiate Folksomaps from established notions of map systems:

  • Relies on user generated content rather than data populated by professionals;
  • Strives for spatial integrity in the logical sense and does not consider spatial integrity in the physical sense as essential (which is a defining feature of social maps);
  • Does not consider visual representation as essential, which is important considering the fact that a large segment of users in developing countries do not have access to Internet (hence my own emphasis on mobile crisis mapping);
  • Is non-static and intelligent in the sense that it infers new information from what is entered by the users.
  • User input is not verified by the system and it is possible that pieces of incorrect information in the knowledgebase may be present at different points of time. Folksomaps adopts the Wiki model and allows all users to add, edit and remove content freely while keeping maps up-to-date.

Conceptual Design

Folksomaps uses “landmark” as the basic unit in the mapping knowledgebase model while “location” represents more coarse-grained geographical areas such as a village, city or country. The model then seeks to capture a few key logical characteristics of locations such as direction, distance, proximity and reachability and layer.

The latter constitutes the granularity of the geographic area that a location represents. “The notion of direction and distance from a location is interpreted with respect to the layer that the location represents. In other words, direction and distance could be viewed as binary operator over locations of the same level. For instance, ‘is towards left of ’ would be appropriate if the location pair being considered is <Libya, Egypt>,” but not if the pair is <Nairobi, India>.

The knowledgebase makes use of two modules, the Web Ontology Language (OWL) and a graph database, to represent and store the above concepts. The Semantic Web language OWL is used to model the categorical characteristics of a landmark (e.g., direction, proximity, etc), and thence infer new relationships not explicitly specified by users of the system. In other words, OWL provides an ontology of locations.

The graph database is used represent distance (numerical relationships) between landmarks. “The locations are represented by nodes and the edges between two nodes of the graph are labeled with the distance between the corresponding locations.” Given the insights gained from user surveys, precise distances and directions are not integral components of community-based maps.

The two modules are used to generate answers to queries submitted by users.

User Interaction

The authors rightly recognize that the user interface design is critical to the success of community-based mapping projects. To be sure, users of may be illiterate, or semi-illiterate and not very tech-savvy. Furthermore, users will tend to query the map system when they need it most, e.g., “when they are stuck on the road looking for directions […] and would be pressed for time.” This very much holds true for crisis mapping as well.

Users can perform three main tasks with the system: “find place”, “trace path” and “add info.” In addition, some or all users may be granted the right to edit or remove entries from the knowledgebase. The Folksomaps system can also be bootstrapped from existing databases to populate instances of location types. “Two such sources of data in the absence of a full-fledged Geographical Information System (GIS) come from the Telecom Industry and the Postal Department.”

folksofig3

How the users interface with the system to carry out these tasks will depend on how tech-savvy or literate they are and what type of access they have to information and communication technologies.

Folksomaps thus provides three types of interface: web-based, voice-based and SMS-based. Each interface allows the user to query and update the database. The web-based interface was developed using Java Server Pages (JSP) while the voice-based interface uses JSPs and VoiceXML.

folksofig41

I am particularly interested in the voice-based interface. The authors point to previous studies that suggest a voice-based interaction works well with users who are illiterate or semi-illiterate and who cannot afford to have high-end devices but can use ordinary low-end phones.

folksofig1

I will share this with the Ushahidi development team with the hopes that they will consider adding a voice-based interface for the platform later this year. To be sure, could be very interesting to integrate Freedom Fone’s work in this area.

Insights from User Studies

The authors conducted user studies to verify the benefit and acceptability of Folksomaps. Tech-savvy used the web-based interface while non-tech savvy participants used the voice-based interface. The results are shown in the two tables below.

folksotb3

Several important insights surfaced from the results of the user studies. For example, an important insight gained from the non-tech user feedback was “the sense of security that they would get with such a system. […] Even though asking for travel directions from strangers on the street is an option, it exposes the enquirer to criminal elements […].”

Another insight gain was the fact that many non-tech savvy participants were willing to pay for the call even a small premium over normal charges as they saw value to having this information available to them at all times.” That said, the majority of participants “preferred the advertisement model where an advertisement played in the beginning of the call pays for the entire call.”

Interestingly, almost all participants preferred the voice-based interface over SMS even though the former led to a number of speech recognition errors. The reason being that “many people are either not comfortable using SMS or not comfortable using a mobile phone itself.”

There were also interesting insights on the issue of accuracy from the perspective of non-tech savvy participants. Most participants asked for full accuracy and only a handful were tolerant of minor mistakes. “In fact, one of the main reasons for preferring a voice call over asking people for directions was to avoid wrong directions.”

This need for high accuracy is driven by the fact that most people use public transportation, walk or use a bicycle to reach their destination, which means the cost of incorrect information is large compared to someone who owns a car.

This is an important insight since the authors had first assumed that tolerance for incorrect information was higher. They also learned that meta information is as important to non-tech savvy users as the landmarks themselves. For instance, low-income participants were more interested in knowing the modes of available transportation, timetables and bus route numbers than the road route from a source to a destination.

folkstb4

In terms of insights from tech-savvy participants, they did not ask for fine-grained directions all the time. “They were fight with getting high level directions involving major landmarks.” In addition, the need for accuracy was not as strong as for the non-tech savvy respondents and they preferred the content from the queries sent to them via SMS so they could store it for future access, “pointing out that it is easy to forget the directions if you just hear it.”

Some tech-savvy participants also suggested that the directions provided by Folksomaps should “take into consideration the amount of knowledge the subject already has about the area, i.e., it should be personalized based upon user profile. Other participants mentioned that “frequent changes in road plans due to constructions should be captured by such a system—thus making it more usable than just getting directions.”

Conclusion

In sum, the user interface of Folksomaps needs to be “rich and adaptive to the information needs of the user […].” To be sure, given user preference towards “voice-based interface over SMS, designing an efficient user-friendly voice-based user interface […].” In addition, “dynamic and real-time information augmented with traditional services like finding directions and locations would certainly add value to Folksomaps.” Furthermore, the authors recognize that Folksomaps can “certainly benefit from user interface designs,” and “multi-model front ends.”

Finally, the user surveys suggest “the community is very receptive towards the concept of a community-driven map,” so it is important that the TRMA project in the Sudan and the ecosystem Liberia project build on the insights and lessons learned provided in this study.

Patrick Philippe Meier

Developing Swift River to Validate Crowdsourcing

Swift River is an Ushahidi initiative to crowdsource the process of data validation. We’re developing a Swift River pilot to complement the VoteReport India crowdsourcing platform we officially launched this week. As part of the Swift River team, I’d like to share with iRevolution readers what I hope the Swift River tool will achieve.

We had an excellent series of brainstorming sessions several weeks ago in Orlando and decided we would combine both natural language processing (NLP) and decentralized human filtering to get one step closer at validating crowdsourced data. Let me expand on how I see both components working individually and together.

Automated Parsing

Double-counting has typically been the bane of traditional NLP or automated event-data extraction algorithms. At Virtual Research Associates (VRA), for example, we would parse headlines of Reuters newswires in quasi real-time, which meant that a breaking story would typically be updated throughout the day or week.

But the natural language parser was specifically developed to automate event-data extraction based on the parameters “Who did what, to whom, where and when?” In other words, the parser could not distinguish whether coded events were actually the same or related. This tedious task was left to VRA analysts to carry out.

Digital Straw

The logic behind eliminating double counting (duplicate event-data) is inevitably reversed given the nature of crowdsourcing. To be sure, the more reports are collected about a specific event, the more likely it is that the event in question actually took place as described by the crowd. Ironically, that is precisely why we want to “drink from the fire hose,” the swift river of data gushing through the wires of social media networks.

We simply need a clever digital straw to filter the torrent of data. This is where our Swift River project comes in and why I first addressed the issue of double counting. One of the central tasks I’d like Swift River to do is to parse the incoming reports from VoteReport India and to cluster them into unique event-clusters. This would be one way to filter the cascading data. Moreover, the parser could potentially help filter fabricated reports.

An Example

For example, if 17 individual reports from different sources are submitted over a two-day period about “forged votes,” then the reports in effect self-triangulate or validate each other. Of course, someone (with too much time on their hands) might decide to send 17 false reports about “forged votes.”

Our digital straw won’t filter all the impurities, but automating this first-level filter is surely better than nothing. Automating this process would require that the digital straw automate the extraction of nouns, verbs and place names from each report, i.e., actor, action and location. Date and time would automatically be coded based on when the report was submitted.

Reports that use similar verbs (synonyms) and refer to the same or similar actors at the same location on the same day can then be clustered into appropriate event-clusters. More on that in the section on crowdsourcing the filter below.

More Filters

A second-level filter would compare the content of the reports to determine if they were exact replicas. In other words, if someone were simply copying and pasting the same report, Swift River could flag those identical reports as suspicious. This means someone gaming the system would have to send multiple reports with different wording, thus making it a bit more time consuming to game the system.

A third-level filter or trip-wire could compare the source of the 17 reports. For example, perhaps 10 reports were submitted by email, 5 by SMS and two by Twitter. The greater the diversity of media used to report an event, the more likely that event actually happened. This means that someone wanting to game the system would have to send several emails, text messages and Tweets using different language to describe a particular event.

A fourth-level filter could identify the email addresses, IP addresses and mobile phone numbers in question to determine if they too were different. A crook trying to game the system would now have to send emails from different accounts and IP addresses, different mobile phone numbers, and so on. Anything “looking suspicious” would be flagged for a human to review; more on that soon. The point is to make the gaming of the system as time consuming and frustrating as possible.

Gaming the System

Of course, if someone is absolutely bent on submitting fabricated data that passes all the filters, then they will.  But those individuals probably constitute a minority of offenders. Perhaps the longer and more often they do this, the more likely someone in the crowd will pick up on the con. As for the less die-hard crooks out there, they may try and game the system only to see that their reports do not get mapped. Hopefully they’ll give up.

I do realize I’m giving away some “secrets” to gaming the system, but I hope this will be more a deterrent than an invitation to crack the system. If you do happen to be someone bent on gaming the platform, I wish you’d get in touch with us instead and help us improve the filters. Either way, we’ll learn from you.

No one on the Swift River team claims that 100% of the dirt will be filtered. What we seek to do is develop a digital filter that makes the data that does come through palatable enough for public consumption.

Crowdsourcing the Filter

Remember the unique event-clusters idea from above? These could be visualized in a simple and intuitive manner for human volunteers (the crowd) to filter. Flag icons, perhaps using three different colors—green, orange and red—could indicate how suspicious a specific series of reports might be based on the results of the individual filters described above.

A green flag would indicate that the report has been automatically mapped on VoteReport upon receipt. An orange flag would indicate the need for review by the crowd while a red flag would send an alert for immediate review.

If a member of the crowd does confirm that a series of reports were indeed fabricated, Swift River would note the associated email address(es), IP address(es) and/or mobile phone number(s) and automatically flag future reports from those sources as red. In other words, Swift River would start rating the credibility of users as well.

If we can pull this off, Swift River may actually start to provide “early warning” signals. To be sure, if we fine tune our unique event-cluster approach, a new event-cluster would be created by a report that describes an event which our parser determines has not yet been reported on.

This should set off a (yellow) flag for immediate review by the crowd. This could either be a legitimate new event or a fabricated report that doesn’t fit into pre-existing cluster. Of course, we will get a number of false positives, but that’s precisely why we include the human crowdsourcing element.

Simplicity

Either way, as the Swift River team has already agreed, this process of crowdsourcing the filter needs to be rendered as simple and seamless as possible. This means minimizing the number of clicks and “mouse motions” a user has to make and allowing for short-cut keys to be used, just like in Gmail. In addition, a userfiendly version of the interface should be designed specifically for mobile phones (various platforms and brands).

As always, I’d love to get your feedback.

Patrick Philippe Meier

Ushahidi Comes to India for the Elections (Updated)

I’m very please to announce that the Ushahidi platform has been deployed at VoteReport.in to crowdsource the monitoring of India’s upcoming elections. The roll out followed our preferred model: an amazing group of Indian partners took the initiative to drive the project forward and are doing a superb job. I’m learning a lot from their strategic thinking.

picture-3

We’re also excited about developing Swift River as part of VoteReport India to apply a crowdsourcing approach to filter the incoming information for accuracy. This is of course all experimental and we’ll be learning a lot in the process. For a visual introduction to Swift River, please see Erik Hersman’s recent video documentary on our conversations on Swift River, which we had a few weeks ago in Orlando.

picture-5

As per our latest Ushahidi deployments, VoteReport users can report on the Indian elections by email, SMS, Tweet or by submitting an incident directly online at VoteReport. Users can also subscribe to email alerts—a functionality I’m particularly excited about as this closes the crowdsourcing to crowdfeeding feedback loop; so I’m hoping we can also add SMS alerts, funding permitted. For more on crowdfeeding, please see my previous post on “Ushahidi: From Crowdsourcing to Crowdfeeding.

picture-4

You can read more about the project here and about the core team here. It really is an honor to be a part of this amazing group. We also have an official VoteReport blog here. I also highly recommend reading Gaurav Mishra‘s blog post on VoteReport here and Ushahidi’s here.

Next Steps

  • We’re thinking of using a different color to depict “All Categories” since red has cognitive connotations of violence and we don’t want this to be the first impression given by the map.
  • I’m hoping we can add a “download feature” that will allow users to directly download the VoteReport data as a CSV file and as a KML Google Earth Layer. The latter will allow users to dynamically visualize VoteReports over space and time just like [I did here] with the Ushahidi data during the Kenyan elections.
  • We’re also hoping to add a feature that asks those submitting incidents to check-off that the information they submit is true. The motivation behind this is inspired from recent lessons learned in behavioral economics as explained in my blog post on “Crowdsourcing Honesty.

Patrick Philippe Meier

iRevolution One Year On…

I started iRevolution exactly one year ago and it’s been great fun! I owe the Fletcher A/V Club sincere thanks for encouraging me to blog. Little did I know that blogging was so stimulating or that I’d be blogging from the Sudan.

Here are some stats from iRevolution Year One:

  • Total number of blog posts = 212
  • Total number of comments = 453
  • Busiest day ever = December 15, 2008

And the Top 10 posts:

  1. Crisis Mapping Kenya’s Election Violence
  2. The Past and Future of Crisis Mapping
  3. Mobile Banking for the Bottom Billion
  4. Impact of ICTs on Repressive Regimes
  5. Towards an Emergency News Agency
  6. Intellipedia for Humanitarian Warning/Response
  7. Crisis Mapping Africa’s Cross-border Conflicts
  8. 3D Crisis Mapping for Disaster Simulation
  9. Digital Resistance: Digital Activism and Civil Resistance
  10. Neogeography and Crisis Mapping Analytics

I do have a second blog that focuses specifically on Conflict Early Warning, which I started at the same time. I have authored a total of 48 blog posts.

That makes 260 posts in 12 months. Now I know where all the time went!

The Top 10 posts:

  1. Crimson Hexagon: Early Warning 2.0
  2. CSIS PCR: Review of Early Warning Systems
  3. Conflict Prevention: Theory, Police and Practice
  4. New OECD Report on Early Warning
  5. Crowdsourcing and Data Validation
  6. Sri Lanka: Citizen-based Early Warning/Response
  7. Online Searches as Early Warning Indicators
  8. Conflict Early Warning: Any Successes?
  9. Ushahidi and Conflict Early Response
  10. Detecting Rumors with Web-based Text Mining System

I look forward to a second year of blogging! Thanks to everyone for reading and commenting, I really appreciate it!

Patrick Philippe Meier

Peer Producing Human Rights

Molly Land at New York Law School has written an excellent paper on peer producing human rights, which will appear in the Alberta Law Review, 2009. This is one of the best pieces of research that I have come across on the topic. I highly recommend reading her article when published.

Molly considers Wikipedia, YouTube and Witness.org in her excellent research but somewhat surprisingly does not reference Ushahidi. I thus summarize her main points below and draw on the case study of Ushahidi—particularly Swift River—to compare and contrast her analysis with my own research and experience.

Introduction

Funding for human rights monitoring and advocacy is particularly limited, which is why “amateur involvement in human rights activities has the potential to have a significant impact on the field.” At the same time, Molly recognizes that peer producing human rights may “present as many problems as it solves.”

Human rights reporting is the most professionalized activity of human rights organizations. This professionalization exists “not because of an inherent desire to control the process, but rather as a practical response to the demands of reporting-namely, the need to ensure accuracy of the information contained in the report.” The question is whether peer-produced human rights reporting can achieve the same degree of accuracy without a comparable centralized hierarchy.

Accurate documentation of human rights abuses is very important for building up a reputation as a credible human rights organization. Accuracy is also important to counter challenges by repressive regimes that question the validity of certain human rights reports. Moreover, “inaccurate reporting risks injury not only to the organization’s credibility and influence but also to those whose behalf the organization advocates.”

Control vs Participation

A successful model for peer producing human rights monitoring would represent an important leap forward in the human rights community. Such a model would enable us to process a lot more information in a timelier manner and would also “increase the extent to which ordinary individuals connect to human rights issues, thus fostering the ability of the movement to mobilize broad constituencies and influence public opinion in support of human rights.”

Increased participation is often associated with an increased risk of inaccuracy. In fact, “even the perception of unreliability can be enough to provide […] a basis for critiquing the information as invalid.” Clearly, ensuring the trustworthiness of information in any peer-reviewed project is a continuing challenge.

Wikipedia uses corrective editing as the primary mechanism to evaluate the accuracy of crowdsourced information. Molly argues that this may not work well in the human rights context because direct observation, interviews and interpretation are central to human rights research.

To this end, “if the researcher contributes this information to a collaboratively-edited report, other contributors will be unable to verify the statements because they do not have access to either the witness’s statement or the information that led the researcher to conclude it was reliable.” Even if they were able to verify statements, much of human rights reporting is interpretive, which means that even experienced human rights professionals disagree about interpretive conclusions.

Models for Peer Production

Molly presents three potential models to outline how human rights reporting and advocacy might be democratized. The first two models focus on secondary and primary information respectively, while the third proposes certification by local NGOs. Molly outlines the advantages and challenges that each model presents. Below is a summary with my critiques. I do not address the third model because as noted by Molly it is not entirely participatory.

Model 1. This approach would limit peer-production to collecting, synthesizing and verifying secondary information. Examples include “portals or spin-offs of existing portals, such as Wikipedia,” which could “allow participants to write about human rights issues but require them to rely only on sources that are verifiable […].” Accuracy challenges could be handled in the same way that Wikipedia does; namely through a “combination of collaborative editing and policies; all versions of the page are saved and it is easy for editors who notice gaming or vandalism to revert to the earlier version.”

The two central limitations of this approach are that (1) the model would be limited to a subset of available information restricted to online or print media; and (2) even limiting the subset of information might be insufficient to ensure reliability. To this end, this model might be best used to complement, not substitute, existing fact-finding efforts.

Model 2. This approach would limit the peer-production of human rights report to those with first-hand knowledge. While Molly doesn’t reference Ushahidi in her research, she does mention the possibility of using a website that would allow witnesses to report human rights abuses that they saw or experienced. Molly argues that this first-hand information on human rights violations could be particularly useful for human rights organizations that seek to “augment their capacity to collect primary information.”

This model still presents accuracy problems, however. “There would be no way to verify the information contributed and it would be easy for individuals to manipulate the system.” I don’t agree. The statement: “there would be no way to verify the information” is an exaggeration. There multiple methods that could be employed to determine the probability that the contributed information is reliable, which is the motivation behind our Swift River project at Ushahidi, which seeks to use crowdsourcing to filter human rights information.

Since Swift River deserves an entire blog post to itself, I won’t describe the project. I’d just like to mention that the Ushahidi team just spent two days brainstorming creative ways that crowdsourced information could be verified. Stay tuned for more on Swift River.

We can still address Molly’s concerns without reference to Ushahidi’s Swift River.

Individuals who wanted to spread false allegations about a particular government or group, or to falsely refute such allegations, might make multiple entries (which would therefore corroborate each other) regarding a specific incident. Once picked up by other sources, such allegations ‘may take on a life of their own.’ NGOs using such information may feel compelled to verify this information, thus undermining some of the advantages that might otherwise be provided by peer production.

Unlike Molly, I don’t see the challenge of crowdsourced human rights data as first and foremost a problem of accuracy but rather volume. Accuracy, in many instances, is a function of how many data points exist in our dataset.

To be sure, more crowdsourced information can provide an ideal basis for triangulation and validation of peer produced human rights reporting-particularly if we embrace multimedia in addition to simply text. In addition, more information allows us to use probability analysis to determine the potential reliability of incoming reports. This would not undermine the advantages of peer-production.

Of course, this method also faces some challenges since the success of triangulating crowdsourced human rights reports is dependent on volume. I’m not suggesting this is a perfect fix, but I do argue that this method will become increasingly tenable since we are only going to see more user-generated content, not less. For more on crowdsourcing and data validation, please see my previous posts here.

Molly is concerned that a website allowing peer-production based on primary information may “become nothing more than an opinion site.” However, a crowdsourcing platform like Ushahidi is not an efficient platform for interactive opinion sharing. Witnesses simply report on events, when they took place and where. Unlike blogs, the platform does not provide a way for users to comment on individual reports.

Capacity Building

Molly does raise an excellent point vis-à-vis the second model, however. The challenges of accuracy and opinion competition might be resolved by “shifting the purpose for which the information is used from identifying violations to capacity building.” As we all know, “most policy makers and members of the political elite know the facts already; what they want to know is what they should do about them.”

To this end, “the purpose of reporting in the context of capacity building is not to establish what happened, but rather to collect information about particular problems and generate solutions. As a result, the information collected is more often in the form of opinion testimony from key informants rather than the kind of primary material that needs to be verified for accuracy.”

This means that the peer produced reporting does not “purport to represent a kind of verifiable ‘truth’ about the existence or non-existence of a particular set of facts,” so the issue of “accuracy is somewhat less acute.” Molly suggests that accuracy might be further improved by “requiring participants to register and identify themselves when they post information,” which would “help minimize the risk of manipulation of the system.” Moreover, this would allow participants to view each other’s contributions and enable a contributor to build a reputation for credible contributions.

However, Molly points out that these potential solutions don’t change the fact that only those with Internet access would be able to contribute human right reports, which could “introduce significant bias considering that most victims and eyewitnesses of human rights violations are members of vulnerable populations with limited, if any, such access.” I agree with this general observation, but I’m surprised that Molly doesn’t reference the use of mobile phones (and other mobile technologies) as a way to collect testimony from individuals without access to the Internet or in inaccessible areas.

Finally, Molly is concerned that Model 2 by itself “lacks the deep participation that can help mobilize ordinary individuals to become involved in human rights advocacy.” This is increasingly problematic since “traditional  ‘naming and shaming’ may, by itself, be increasingly less effective in its ability to achieve changes state conduct regarding human rights.” So Molly rightly encourages the human rights community to “investigate ways to mobilize the public to become involved in human rights advocacy.”

In my opinion, peer produced advocacy faces the same challenges as traditional human rights advocacy. It is therefore important that the human rights community adopt a more tactical approach to human rights monitoring. At Ushahidi, for example, we’re working to add a “subscribe-to-alerts” feature, which will allow anyone to receive SMS alerts for specific locations.

P2P Human Rights

The point is to improve the situational awareness of those who find themselves at risk so they can get out of harm’s way and not become another human rights statistic. For more on tactical human rights, please see my previous blog post.

Human rights organizations that are engaged in intervening to prevent human rights violations would also benefit from subscribing to Ushahidi. More importantly, the average person on the street would have the option of intervening as well. I, for one, am optimistic about the possibility of P2P human rights protection.

Patrick Philippe Meier

Crowdsourcing in Crisis: A More Critical Reflection

This is a response to Paul’s excellent comments on my recent posts entitled “Internews, Ushahidi and Communication in Crisis” and “Ushahidi: From Croudsourcing to Crowdfeeding.”

Like Paul, I too find Internews to be a top organization. In fact, of all the participants in New York, the Internews team in was actually the most supportive of exploring the crowdsourcing approach further instead of dismissing it entirely. And like Paul, I’m not supportive of the status quo in the humanitarian community either.

Paul’s observations are practical and to the point, which is always appreciated. They encourage me revisit and test my own assumptions, which I find stimulating. In short, Paul’s comments are conducive to a more critical reflection of crowdsourcing in crisis.

In what follows, I address all his arguments point by point.

Time Still Ignored

Paul firstly notes that,

Both accuracy and timeliness are core Principles of Humanitarian Information management established at the 2002 Symposium on Best Practices in Humanitarian Information Exchange and reiterated at the 2007 Global Symposium +5. Have those principles been incorporated into the institutions sufficiently? Short answer, no. Is accuracy privileged at the expense of timeliness? Not in the field.

The importance of “time” and “timeliness” was ignored during both New York meetings. Most field-based humanitarian organizations dismissed the use of  “crowdsourcing” because of their conviction that “crowdsourced information cannot be verified.” In short, participants did not privilege timeliness at the expense accuracy because they consider verification virtually impossible.

Crowdsourcing is New

Because crowdsourcing is unfamiliar, it’s untested in the field and it makes fairly large claims that are not well backed by substantial evidence. Having said that, I’m willing to be corrected on this criticism, but I think it’s fair to say that the humanitarian community is legitimately cautious in introducing new concepts when lives are at stake.

Humanitarian organizations make claims about crowdsourcing that are not necessarily backed by substantial evidence because crowdsourcing is fairly new and untested in the field. If we use Ushahidi as the benchmark, then crowdsourcing crisis informaiton is 15 months old and the focus of the conversation should be on the two Ushahid deployments (Kenya & DRC) during that time.

The angst is understandable and we should be legitimately cautious. But angst shouldn’t mean we stand back and accept the status quo, a point that both Paul and I agree on.

Conflict Inflamation

Why don’t those who take the strongest stand against crowdsourcing demonstrate that Ushahidi-Kenya and Ushahidi-DRC have led to conflict inflammation? As far we know, none of the 500+ crowdsourced crisis events in those countries were manufactured to increase violence. If that is indeed the case, then skeptics like Paul should explain why we did not see Ushahidi be used to propagate violence.

In any event, if we embrace the concept of human development, then the decision vis-à-vis whether or not to crowdsource and crowdfeed information ultimately lies with the crowd sourcers and feeders. If the majority of users feel compelled to generate and share crisis information when a platform exists, then it is because they find value in doing so. Who are we to say they are not entitled to receive public crisis information?

Incidentally, it is striking to note the parallels between this conversation and skeptics during the early days of Wikipedia.

Double Standards

I would also note that I don’t think the community is necessarily holding crowdsourcing to a higher standard, but exactly the same standard as our usual information systems – and if they haven’t managed to get those systems right yet, I can understand still further why they’re cautious about entertaining an entirely new and untested approach.

Cautious and dismissive are two different things. If the community were holding crowdsourcing to an equal standard, then they would consider both the timeliness and accuracy of crowdsourced information. Instead, they dismiss crowdsourcing without recognizing the tradeoff with timeliness.

What is Crisis Info?

In relation to my graphic on the perishable nature of time, Paul asks

What “crisis information” are we talking about here? I would argue that ensuring your data is valid is important at all times, so is this an attack on dissemination strategies rather than data validation?

We’re talking about quasi-real time and geo-tagged incident reporting, i.e., reporting using the parameters of incident type, location and time. Of course it is important that data be as accurate as possible. But as I have already argued, accurate information received late is of little operational value.

On the other hand information that has not been yet validated but received early gives those who may need the information the most (1) more time to take precautionary measures, and (2) more time to determine its validity.

Unpleasant Surprises

On this note, I just participated in the Harvard Humanitarian Initiative (HHI)’s Humanitarian Action Summit  (HAS) where the challenge of data validation came up within the context of public health and emergency medicine. The person giving the presentation had this to say:

We prefer wrong information to no information at all since at least we can at least take action in the case of the former to determine the validity of the information.

This reminds me of the known unknowns versus unknown unknowns argument. I’d rather know about a piece of information even though I’m unable to validate it rather than not know and be surprised later in case it turns out to be true.

We should take care not to fall into the classic trap exploited by climate change skeptics. Example: We can’t prove that climate change is really happening since it could simply be that we don’t have enough accurate data to arrive at the correct conclusion. So we need more time and data for the purposes of validation. Meanwhile, skeptics argue, there’s no need to waste resources by taking precautionary measures.

Privileging Time

It also strikes me as odd that Patrick argues that affected communities deserve timely information but not necessarily accurate information. As he notes, it may be a trade-off – but he provides no argument for why he privileges timeliness over accuracy.

I’m not privileging one over the other. I’m simply noting that humanitarian organizations in New York completely ignored the importance of timeliness when communicating with crisis-affected communities, which I still find stunning. It is misleading to talk about accuracy without talking about timeliness and vice versa. So I’m just asking that we take both variables into account.

Obviously the ideal would be to have timely and accurate information. But we’re not dealing with ideal situations when we discuss sudden onset emergencies. Clearly the “right” balance between accuracy and timeliness depends who the end users are and what context they find themselves in. Ultimately, the end users, not us, should have the right to make that final decision for themselves. While accuracy can saves lives, so can timeliness.

Why Obligations?

Does this mean that the government and national media have an obligation to report on absolutely every single violation of human rights taking place in their country? Does this mean that the government and national media have an obligation to report on absolutely every single violation of human rights taking place in their country?

I don’t understand how this question follows from any of my preceding comments. We need to think about information as an ecosystem with multiple potential sources that may or may not overlap. Obviously governments and national media may not be able to—or compelled to—report accurately and in a timely manner during times of crises. I’m not making an argument about obligation. I’m just making an observation about there being a gap that crowdsourcing can fill, which I showed empirically in this Kenya case study.

Transparency and Cooperation

I’m not sure it’s a constructive approach to accuse NGOs of actively “working against transparency” – it strikes me that there may be some shades of grey in their attitudes towards releasing information about human rights abuses.

You are less pessimistic than I am—didn’t think that was possible. My experience in Africa has been that NGOs (and UN agencies) are reluctant to share information not because of ethical concerns but because of selfish and egotistical reasons. I’d recommend talking with the Ushahidi team who desperately tried to encourage NGOs to share information with each other during the post-election violence.

Ushahidi is Innovation

On my question about why human rights and humanitarian organizations were not the one to set up a platform like Ushahidi, Paul answers as follows.

I think it might be because the human rights and humanitarian communities were working on their existing projects. The argument that these organisations failed to fulfill an objective when they never actually had that objective in the first place is distinctly shakey – it seems to translate into a protest that they weren’t doing what you wanted them to do.

I think Paul misses the point. I’m surprised he didn’t raise the whole issue of innovation (or rather lack thereof) in the humanitarian community since he has written extensively about this topic.

Perhaps we also have to start thinking in terms of what damage might this information do (whether true or false) if we release it.

I agree. At the same time, I’d like to get the “we” out of the picture and let the “them” (the crowd) do the deciding. This is the rationale behind the Swift River project we’re working on at Ushahidi.

Tech-Savvy Militias

Evidence suggests that armed groups are perfectly happy to use whatever means they can acquire to achieve their goals. I fail to see why Ushahidi would be “tactically inefficient, and would require more co-ordinating” – all they need to do is send a few text messages. The entire point of the platform is that it’s easy to use, isn’t it?

First of all, the technological capacity and sophistication of non-state armed groups varies considerably from conflict to conflict. While I’m no expert, I don’t know of any evidence from Kenya or the DRC—since those are our empirical test cases—that suggest tech-savvy militia members regularly browse the web to identify new Web 2.0 crowdsourcing tools they can use to create more violence.

Al Qaeda is a different story, but we’re not talking about Al Qaeda, we’re talking about Kenya and the DRC. In the case of the former, word about Ushahidi spread through the Kenyan blogosphere. Again, I don’t know of any Kenyan militia groups in the Rift Valley, for example, that monitors the Kenyan blogosphere to exploit violence.

Second of all, one needs time to learn how to use a platform like Ushahidi for conflict inflammation. Yes, the entire point of the platform is that it’s easy to use to report human rights violations. But it obviously takes more thinking to determine what, where and when to text an event in order to cause a particular outcome. It requires a degree of coordination and decision-making.

That’s why it would be inefficient. All a milita would need to do is fire a few bullets from one end of a village to have the locals run the other way straight into an ambush. Furthermore, we found no evidence of hate SMS submitted to Ushahidi even though there were some communicated outside of Ushahidi.

Sudan Challenges

The government of Sudan regularly accuses NGOs (well, those NGOs it hasn’t expelled) of misreporting human rights violations. What better tool would the government have for discrediting human rights monitoring than Ushahidi? All it would take would be a few texts a day with false but credible reports, and the government can dismiss the entire system, either by keeping their own involvement covert and claiming that the system is actually being abused, or by revealing their involvement and claiming that the system can be so easily gamed that it isn’t credible.

Good example given that I’m currently in the Sudan. But Paul is mixing human rights reporting for the purposes of advocacy with crisis reporting for the purposes of local operational response.

Of course government officials like those in Khartoum will do, and indeed continue to do, whatever the please. But isn’t this precisely why one might as well make the data open and public so those facing human rights violations can at least have the opportunity to get out of harms way?

Contrast this with the typical way that human rights and humanitarian organizations operate—they typically keep the data for themselves, do not share it with other organizations let alone with beneficiaries. How is data triangulation possible at all given such a scenario even if we had all the time in the world? And who loses out as usual? Those local communities who need the information.

Triangulation

While Paul fully agrees that local communities are rarely dependent on a single source of information, which means they can triangulate and validate, he maintains that this “is not an argument for crowdsourcing.” Of course it is, more information allows more triangulation and hence validation. Would Paul argue that my point is an argument against crowdsourcing?

We don’t need less information, we need more information and the time element matters precisely because we want to speed up the collection of information in order to triangulate as quickly as possible.

Ultimately, it will be a question of probability whether or not a given event is true, the larger your sample size, the more confident you can be. The quicker you collect that sample size, the quicker you can validate. Crowdsourcing is a method that facilitates the rapid collection of large quantities of information which in turn facilitates triangulation.

Laughing Off Disclaimers

The idea that people pay attention to disclaimers makes me laugh out loud. I don’t think anybody’s accusing affected individuals of being dumb, but I’d be interested to see evidence that supports this claim. When does the validation take place, incidentally? And what recourse do individuals or communities have if an alert turns out to be false?

Humanitarians often treat beneficiaries as dumb, not necessarily intentionally, but I’ve seen this first hand in East and West Africa. Again, if you haven’t read “Aiding Violence” then I’d recommend it.

Second, the typical scenario that comes up when talking about crowdsourcing and the spreading of rumors has to do with refugee camp settings. The DRC militia story is one that I came up with (and have already used in past blog posts) in order emphasize the distinction with refugee settings.

The scenario that was brought up by others at the Internews meeting was actually one set in a refugee camp. This scenario is a classic case of individuals being highly restricted in the variety of different information sources they have access to, which makes the spread of rumors difficult to counter or dismiss.

Crowdsourcing Response

When I asked why field-based humanitarian organizations that directly work with beneficiaries in conflict zones don’t take an interest in crowdsourced information and the validation thereof, Paul responds as follows.

Yes, because they don’t have enough to do. They’d like to spend their time running around validating other people’s reports, endangering their lives and alienating the government under which they’re working.

I think Paul may be missing the point—and indeed power—of crowdsourcing. We need to start thinking less in traditional top-down centralized ways. The fact is humanitarian organizations could subscribe to specific alerts of concern to them in a specific and limited geographical area.

If they’re onsite where the action is reportedly unfolding and they don’t see any evidence of rumors being true, surely spending 15 seconds to text this info back to HQ (or to send a picture by camera phone) is not a huge burden. This doesn’t endanger their lives since they’re already there and quelling a rumor is likely to calm things down. If we use secure systems, the government wouldn’t be able to attribute the source.

The entire point behind the Swift River project is to crowdsource the filtering process, ie, to distribute and decentralizes the burden of data validation. Those organizations that happen to be there at the right time and place do the filtering, otherwise they don’t and get on with their work. This is the whole point behind my post last year on crowdsourcing response.

Yes, We Can

Is there any evidence at all that the US Embassy’s Twitter feed had any impact at all on the course of events? I mean, I know it made a good headline in external media, but I don’t see how it’s a good example if there’s no actual evidence that it had any impact.

Yes, the rumors didn’t spread. But we’re fencing with one anecdote after the other. All I’m arguing is that two-way communication and broadcasting should be used to counter misinformation;  meaning that it is irresponsible for humanitarian organizations to revert to one-way communication mindsets and wash their hands clean of an unfolding situation without trying to use information and communication technology to do something about it.

Many still don’t understand that the power of P2P meshed communication can go both ways. Unfortunately, as soon as we see new communication technology used for ill, we often react even more negatively by pulling the plug on any communication, which is what the Kenyan government wanted to do during the election violence.

Officials requested that the CEO of Safaricom switch off the SMS network to prevent the spread of hate SMS, he chose to broadcast text messages calling for peace, restraint and warning that those found to be creating hate SMS would be tracked and prosecuted (which the Kenyan Parliament subsequently did).

Again, the whole point is that new communication technologies present a real potential for countering rumors and unless we try using them to maximize positive communication we will never get sufficient evidence to determine whether using SMS and Twitter to counter rumors can work effectively.

Ushahidi Models

In terms of Ushahidi’s new deployment model being localized with the crowdsourcing limited to members of a given organization, Paul has a point when he suggests this “doesn’t sound like crowdsourcing.” Indeed, the Gaza deployment of Ushahidi is more an example of “bounded crowdsourcing” or “Al Jazeera sourcing” since the crowd is not the entire global population but strictly Al Jazeera journalists.

Perhaps crowdsourcing is not applicable within those contexts since “bounded crowdsourcing” may in effect be an oxymoron. At the same time, however, his conclusion that Ushahidi is more like classic situation reporting is not entirely accurate either.

First of all, the Ushahidi platform provides a way to map incident reports, not situation reports. In other words, Ushahidi focuses on the minimum essential indicators for reporting an event. Second, Ushahidi also focuses on the minimum essential technology to communicate and visualize those events. Third, unlike traditional approaches, the information collected is openly shared.

I’m not sure if this is an issue of language and terminology or if there is a deeper point here. In other words, are we seeing Ushahidi evolve in such a way that new iterations of the platform are becoming increasingly similar to traditional information collection systems?

I don’t think so. The Gaza platform is only one genre of local deployment. Another organization might seek to deploy a customized version of Ushahidi and not impose any restrictions on who can report. This would resemble the Kenya and DRC deployments of Ushahidi. At the moment, I don’t find this problematic because we haven’t found signs that this has led to conflict inflammation. I have given a number of reasons in this blog post why that might be.

In any case, it is still our responsibility to think through some scenarios and to start offering potential solutions. Hence the Swift River project and hence my appreciating Paul’s feedback on my two blog posts.

Patrick Philippe Meier

Ushahidi: From Croudsourcing to Crowdfeeding

Humanitarian organizations at the Internews meetings today made it clear that information during crises is as important as water, food and medicine. There is now a clear consensus on this in the humanitarian community.

This is why I have strongly encouraged Ushahidi developers (as recently as this past weekend) to include a subscription feature that allows crisis-affected communities to subscribe to SMS alerts. In other words, we are not only crowdsourcing crisis information we are also crowdfeeding crisis information.

I set off several flags when I mentioned this during the Internews meeting since crowdsourcing typically raises concerns about data validation or lack thereof. Participants at the meeting began painting scenarios whereby militias in the DRC would submit false reports to Ushahidi in order to scare villagers (who would receive the alert by SMS) and have them flee in their direction where they would ambush them.

Here’s why I think humanitarian organizations may in part be wrong.

First of all, militias do not need Ushahidi to scare or ambush at-risk communities. In fact, using a platform like Ushahidi would be tactically inefficient and would require more coordinating on their part.

Second, local communities are rarely dependent on a single source of information. They have their own trusted social and kinship networks, which they can draw on to validate information. There are local community radios and some of these allow listeners to call in or text in with information and/or questions. Ushahidi doesn’t exist in an information vacuum. We need to understand information communication as an ecosystem.

Third, Ushahidi makes it clear that the information is crowdsourced and hence not automatically validated. Beneficiaries are not dumb; they can perfectly well understand that SMS alerts are simply alerts and not confirmed reports. I must admit that the conversation that ensued at the meeting reminded me of Peter Uvin’s “Aiding Violence” in which he lays bare our “infantilizing” attitude towards “our beneficiaries.”

Fourth, many of the humanitarian organizations participating in today’s meetings work directly with beneficiaries in conflict zones. Shouldn’t they take an interest in the crowdsourced information and take advantage of being in the field to validate said information?

Fifth, all the humanitarian organizations present during today’s meetings embraced the need for two-way, community-generated information and social media. Yet these same organizations fold there arms and revert to a one-way communication mindset when the issue of crowdsourcing comes up. They forget that they too can generate information in response to rumors and thus counter-act misinformation as soon as it spreads. If the US Embassy can do this in Madagascar using Twitter, why can’t humanitarian organizations do the equivalent?

Sixth, Ushahidi-Kenya and Ushahidi-DRC were the first deployments of Ushahidi. The model that Ushahidi has since adopted involves humanitarian organizations like UNICEF in Zimbabwe or Carolina for Kibera in Nairobi, and international media groups like Al-Jazeera in Gaza, to use the free, open-source platform for their own projects. In other words, Ushahidi deployments are localized and the crowdsourcing is limited to trusted members of those organizations, or journalists in the case of Al-Jazeera.

Patrick Philippe Meier

Internews, Ushahidi and Communication in Crises

I had the pleasure of participating in two Internews sponsored meetings in New York today. Fellow participants included OCHA, Oxfram, Red Cross, Save the Children, World Vision, BBC World Service Trust, Thomson Reuters Foundation, Humanitarian Media Foundation, International Media Support and several others.

img_0409

The first meeting was a three-hour brainstorming session on “Improving Humanitarian Information for Affected Communities” organized in preparation for the second meeting on “The Unmet Need for Communication in Humanitarian Response,” which was held at the UN General Assembly.

img_0411

The meetings presented an ideal opportunity for participants to share information on current initiatives that focus on communications with crisis-affected populations. Ushahidi naturally came to mind so I introduced the concept of crowdsourcing crisis information. I should have expected the immediate push back on the issue of data validation.

Crowdsourcing and Data Validation

While I have already blogged about overcoming some of the challenges of data validation in the context of crowdsourcing here, there is clearly more to add since the demand for “fully accurate information” a.k.a. “facts and only facts” was echoed during the second meeting in the General Assembly. I’m hoping this blog post will help move the discourse beyond the black and white concepts that characterize current discussions on data accuracy.

Having worked in the field of conflict early warning and rapid response for the past seven years, I fully understand the critical importance of accurate information. Indeed, a substantial component of my consulting work on CEWARN in the Horn of Africa specifically focused on the data validation process.

To be sure, no one in the humanitarian and human rights community is asking for inaccurate information. We all subscribe to the notion of “Do No Harm.”

Does Time Matter?

What was completely missing from today’s meetings, however, was a reference to time. Nobody noted the importance of timely information during crises, which is rather ironic since both meetings focused on sudden onset emergencies. I suspect that our demand (and partial Western obsession) for fully accurate information has clouded some of our thinking on this issue.

This is particularly ironic given that evidence-based policy-making and data-driven analysis are still the exception rather than the rule in the humanitarian community. Field-based organizations frequently make decisions on coordination, humanitarian relief and logistics without complete and fully accurate, real-time information, especially right after a crisis strikes.

So why is this same community holding crowdsourcing to a higher standard?

Time versus Accuracy

Timely information when a crisis strikes is a critical element for many of us in the humanitarian and human rights communities. Surely then we must recognize the tradeoff between accuracy and timeliness of information. Crisis information is perishable!

The more we demand fully accurate information, the longer the data validation process typically takes and thus the more likely the information will be become useless. Our public health colleagues who work in emergency medicine know this only too well.

The figure below represents the perishable nature of crisis information. Data validation makes sense during time-periods A and B. Continuing to carry out data validation beyond time B may be beneficial to us, but hardly to crisis affected communities. We may very well have the luxury of time. Not so for at-risk communities.

relevance_time

This point often gets overlooked when anxieties around inaccurate information surface. Of course we need to insure that information we produce or relay is as accurate as possible. Of course we want to prevent dangerous rumors from spreading. To this end, the Thomson Reuters Foundation clearly spelled out that their new Emergency Information Service (EIS) would only focus on disseminating facts and only facts. (See my previous post on EIS here).

Yes, we can focus all our efforts on disseminating facts, but are those facts communicated after time-period B above really useful to crisis-affected communities? (Incidentally, since EIS will be based on verifiable facts, their approach may well be liked to Wikipedia’s rules for corrective editing. In any event, I wonder how EIS might define the term “fact”).

Why Ushahidi?

Ushahidi was created within days of the Kenyan elections in 2007 because both the government and national media were seriously under-reporting widespread human rights violations. I was in Nairobi visiting my parents at the time and it was also frustrating to see the majority of international and national NGOs on the ground suffering from “data hugging disorder,” i.e., they had no interest whatsoever to share information with each other or the public for that matter.

This left the Ushahidi team with few options, which is why they decided to develop a transparent platform that would allow Kenyans to report directly, thereby circumventing the government, media and NGOs, who were working against transparency.

Note that the Ushahidi team is only comprised of tech-experts. Here’s a question: why didn’t the human rights or humanitarian community set up a platform like Ushahidi? Why were a few tech-savvy Kenyans without a humanitarian background able to set up and deploy the platform within a week and not the humanitarian community? Where were we? Shouldn’t we be the ones pushing for better information collection and sharing?

In a recent study for the Harvard Humanitarian Initiative (HHI), I mapped and time-stamped reports on the post-election violence reported by the mainstream media, citizen journalists and Ushahidi. I then created a Google Earth layer of this data and animated the reports over time and space. I recommend reading the conclusions.

Accuracy is a Luxury

Having worked in humanitarian settings, we all know that accuracy is more often luxury than reality, particularly right after a crisis strikes. Accuracy is not black and white, yes or no. Rather, we need to start thinking in terms of likelihood, i.e., how likely is this piece of information to be accurate? All of us already do this everyday albeit subjectively. Why not think of ways to complement or triangulate our personal subjectivities to determine the accuracy of information?

At CEWARN, we included “Source of Information” for each incident report. A field reporter could select from several choices: (1) direct observation; (2) media, and (3) rumor. This gave us a three-point weighted-scale that could be used in subsequent analysis.

At Ushahidi, we are working on Swift River, a platform that applies human crowdsourcing and machine analysis (natural language parsing) to filter crisis information produced in real time, i.e., during time-periods A and B above. Colleagues at WikiMapAid are developing similar solutions for data on disease outbreaks. See my recent post on WikiMapAid and data validation here.

Conclusion

In sum, there are various ways to rate the likelihood that a reported event is true. But again, we are not looking to develop a platform that insures 100% reliability. If full accuracy were the gold standard of humanitarian response (or military action for that matter), the entire enterprise would come to a grinding halt. The intelligence community has also recognized this as I have blogged about here.

The purpose of today’s meetings was for us to think more concretely about communication in crises from the perspective of at-risk communities. Yet, as soon as I mentioned crowdsourcing the discussion became about our own demand for fully accurate information with no concerns raised about the importance of timely information for crisis-affected communities.

Ironic, isn’t it?

Patrick Philippe Meier

HURIDOCS09: From Wikipedia to Ushahidi

The Panel

I just participated in a panel on “Communicating Human Rights Information Through Technology” at the HURIDOCS conference in Geneva and presented Ushahidi as an alternative model. My fellow panelists included Florence Devouard, Chair of the Wikimedia Foundation, Sam Gregory from Witness.org, Lars Bromley from AAAS and Dan Brickley, a researcher, advocate and developer of Semantic Web technologies.

ushahidi

Out of the hundred-or-so participants in the plenary, only a handful, five-or-so, had heard of the Kenyan initiative. So this was a great opportunity to share the Ushahidi story with a diverse coalition of committed human rights workers. There were at least 40 countries or territories represented, ranging from Armenia and Ecuador to Palestine and Zimbabwe.

Since I’ve blogged about Ushahidi extensively already, I will only add a few observations here (see Slideshare for the slides). My presentation followed Florence’s talk on the latest developments at Wikipedia and I really hope to get more of her thoughts on applying lessons learned to the Ushahidi project. Both projects entail crowdsourcing and data validation processes.

Crowdsourcing

“Nobody Knows Everything, but Everyone Knows Something.” I borrowed this line from Florence’s talk to explain the rationale behind Ushahidi. Applied to human rights reporting, “nobody knows about every human rights violation taking place, but everyone may know of some incidents.” The latter is the local knowledge that Ushahidi seeks to render more visible by taking a crowdsourcing approach.

Recognizing the powerful convergence of communication technologies and information ecosystems is key to Ushahidi’s platform. Various deployments of Ushahidi have allowed individuals to report human rights violations online, by SMS and/or via Twitter. Unlike the majority of human rights monitoring platforms, Ushahidi seeks to “close the feedback loop” by allowing individuals to subscribe to alerts in their cities. As we know only too well, monitoring human rights violations is not equivalent to preventing them.

Validation

Given the importance of data validation vis-a-vis human rights reporting, I outlined Ushahidi’s approach and introduced the Swift River initiative which uses crowdsourcing to filter crisis information reported via Twitter, Ushahidi, Flickr, YouTube, local mobile and web social networks. When Ushahidi published their first blog post on Swift River, I commented that Wikipedia was most likely the best at crowdsourcing the filter.

This explains why I’m eager to learn more from Florence regarding her experience with Wikipedia. She mentioned that one new way they track online vandalism of Wikipedia entries is by detecting “sudden changes” in the flow of edits by anonymous users. Edits of this nature must be validated by a third party before being officially published—a new rule being considered by Wikipedia.

One other point worth noting, and which I’ve blogged about before, is that Wikipedia continues to be used for real-time reporting of unfolding crises. We saw this during the London bombings back in 2005 and more recently with the Mumbai attacks. The pages were being edited at least a hundred times a day and as far as I know were as accurate as mainstream media reports and more up-to-date.

The point is, if Wikipedia can serve as a platform for accurate, real-time reporting of political crises, then so should Ushahidi. The challenge is to get enough contributors to Ushahidi to constitute “the crowd” and sufficient alerts to constitute a river. The power here is in the numbers. Perhaps in time the Ushahidi platform may become more like a public sphere where different perspectives on alerts might be exchanged. In other words, we may see a shift away from data “deconfliction” which is reductionist.

The Q&A

The Questions and Answers session was productive and lively. Concerns about data validation and the security of those reporting in repressive environments were raised. The point to keep in mind is that Ushahidi does not exist in a vacuum, which is why I showed HHI’s Google Earth Layer of Kenya’s post-election violence. To be sure, Ushahidi does not replace but rather complements traditional sources of reporting like the national media or alternative sources like citizen journalism. Think of a collage as opposed to a painting.

Human rights incidents mapped on the Ushahidi platform may not be fully validated, but the purpose of Ushahidi is not to provide information on human rights violations that meet ICC standards. The point is to document instances of violations so they (1) can be investigated by interested parties, and (2) serve as potential early warnings for communities caught in conflict. In terms of the security of those engaged in reporting alerts using the Ushahidi platform, the team is adding a feature that allows users to report anonymously.

As expected, there were also concerns about “bad guys” gaming the Ushahidi platform. This is a tricky point to respond to because (1) to the best of my knowledge this hasn’t happened; (2) I’m not sure what the “bad guys” would stand to gain tactically and strategically; (3) Ushahidi has a fraction of the audience—and hence political influence—that television and radio stations have; (4) I doubt “bad guys” are immune to the digital “fog of war“; (5)  the point of Swift River is to make gaming difficult by filtering it out.

In any event, it would behoove Ushahidi to consider potential scenarios in which the platform could be used to promote disinformation and violence. At this point, however, I’m really not convinced that “bad guys” will see the Ushahidi platform as a useful tool to further their own ends.

Patrick Philippe Meier

Crowdsourcing Honesty?

I set an all-time personal record this past week: my MacBook was dormant for five consecutive days. I dedicate this triumph to the delightful friends with whom I spent New Year’s. Indeed, I had the pleasure of celebrating with friends from Digital Democracy, The Fletcher School and The Global Justice Center on a Caribbean island for some much needed time off.

Ariely

We all brought some good reading along and I was finally able to enjoy a number of books on my list. One of these, Dan Ariely’s “Predictably Irrational” was recommended to me by Erik Hersman, and I’m really glad he did. MIT Professor Ariely specializes in behavioral economics. His book gently discredits mainstream economics. Far from being rational agents, we are remarkably irrational in our decision-making, and predictably so.

Ariely draws on a number of social experiments to explicate his thesis.

For social scientists, experiments are like microscopes or strobe lights. They help us slow human behavior to a frame-by-frame narration of events, isolate individual forces, and examine those forces carefully and in more detail. They let us test directly and unambiguously what makes us tick.

In a series of fascinating experiments, Ariely seeks to understand what factors influence our decisions to be honest, especially when we can get away with dishonesty. In one experiment, participants complete a very simple math exercise. When done, the first set of participants (control group) are asked to hand in their answers for independent grading but the second set are subsequently given the answers and asked to report their own scores. At no point do the latter hand in their answers; hence the temptation to cheat.

In this experiment, some students are asked to list the names of 10 books they read in high school while others are asked to write down as many of the Ten Commandments as they can recall prior to the math exercise. Ariely’s wanted to know whether this would have any effect on the honesty of those participants reporting their scores? The statistically significant results surprised even him: “The students who had been asked to recall the Ten Commandments had not cheated at all.”

In fact, they averaged the same score as the (control) group that could not cheat. In contrast, participants who were asked to list their 10 high school books and self-report their scores cheated: they claimed grades that were 33% higher than those who could not cheat (control group).

What especially impressed me about the experiment […] was that the students who could remember only one or two commandments were as affected by them as the students who remembered nearly all ten. This indicated that it was not the Commandments themselves that encouraged honestly, but the mere contemplation of a moral benchmark of some kind.

Ariely carried out a follow up experiment in which he asked some of his MIT students to sign an honor code instead of listing the Commandments. The results were identical. What’s more, “the effect of signing a statement about an honor code is particularly amazing when we take into account that MIT doesn’t even have an honor code.”

In short, we are far more likely to be honest when reminded of morality, especially when temptation strikes. Ariely thus concludes that the act of taking an oath can make all the difference.

I’m intrigued by this finding and it’s potential application to crowdsourcing crisis information, e.g., Ushahidi‘s work in the DRC. Could some version of an honor code be introduced in the self-reporting process? Could the Ushahidi team create a control group to determine the impact on data quality? Even if impact were difficult to establish, would introducing an honor code still make sense given Ariely’s findings on basic behavioral psychology?

Patrick Philippe Meier