I coined the term “bounded crowdsourcing” a couple years back to distinguish the approach from other methodologies for information collection. As tends to happen, some Muggles (in the humanitarian community) ridiculed the term. They freaked out about the semantics instead of trying to understand the under-lying concept. It’s not their fault though, they’ve never been to Hogwarts and have never taken Crowdsourcery 101 (joke!).
Open crowdsourcing or “unbounded crowdsourcing” refers to the collection of information with no intentional constraints. Anyone who hears about an effort to crowdsource information can participate. This definition is inline with the original description put forward by Jeff Howe: outsourcing a task to a generally large group of people in the form of an open call.
In contrast, the point of “bounded crowdsourcing” is to start with a small number of trusted individuals and to have these individuals invite say 3 additional individuals to join the project–individuals who they fully trust and can vouch for. After joining and working on the project, these individuals in turn invite 3 additional people they fully trust. And so on and so forth at an exponential rate if desired. Just like crowdsourcing is nothing new in the field of statistics, neither is “bounded crowdsourcing”; it’s analog being snowball sampling.
In snowball sampling, a number of individuals are identified who meet certain criteria but unlike purposive sampling they are asked to recommend others who also meet this same criteria—thus expanding the network of participants. Although these “bounded” methods are unlikely to produce representative samples, they are more likely to produce trustworthy information. In addition, there are times when it may be the best—or indeed only—method available. Incidentally, a recent study that analyzed various field research methodologies for conflict environments concluded that snowball sampling was the most effective method (Cohen and Arieli 2011).
I introduced the concept of bounded crowdsourcing to the field of crisis mapping in response to concerns over the reliability of crowd sourced information. One excellent real world case study of bounded crowdsourcing for crisis response is this remarkable example from Kyrgyzstan. The “boundary” in bounded crowd-sourcing is dynamic and can grow exponentially very quickly. Participants may not all know each other (just like in open crowdsourcing) so in some ways they become a crowd but one bounded by an invite-only criteria.
I have since recommended this approach to several groups using the Ushahidi platform, like the #OWS movement. The statistical method known as snowball sampling is decades old. So I’m not introducing a new technique, simply applying a conventional approach from statistics to the field of crisis mapping and calling it bounded to distinguish the methodology from regular crowdsourcing efforts. What is different and exciting about combining snowball sampling with crowd-sourcing is that a far larger group can be sampled, a lot more quickly and also more cost-effectively given today’s real-time, free social networking platforms.
Pingback: Why the muggle doesn’t like the term “bounded crowdsourcing” «
You are correct that snowball sampling is not a new idea in the field of social statistics. Unfortunately, your comments fail to acknowledge the central limitation of snowball sampling (and crowdsourcing or “bounded crowdsourcing”) – bias. Snowball sampling methods are not characterized by an underlying probability model, the probability of selection of a given individual is unknown. Further, social networks – by their very nature – suffer from homophily bias (ie the bias that results from people tending to recruit into the sample other people who are very similar to themselves). There is a voluminous literature on the considerable inferential/generalizability limitations of snowball sampling (eg Erickson, 1979; van Meter, 1990). This literature notes that snowball samples are cost-effective in collecting data on hidden populations, but the data collected suffer from systematic biases (in that they are rarely representative of the hidden population the researcher is interested in). So snowball sampling does lead to more data in a more efficient manner, but unfortunately those data are still a convenience sample and probability theory cannot be easily employed to make inferences about the total size and nature of the hidden population. Thus, your analogy of “bounded crowdsourcing” does not adequately respond to concerns over the reliability of crowd-sourced information. Instead, your analogy just reinforces that crowdsourcing (like snowball sampling) helps us collect information about hidden populations faster, but just what that information is representative of remains unclear. The ability of sampling methods like snowball sampling and “bounded crowdsourcing” to generate large samples quickly does not necessarily mean that such methods generate representative samples.
Thanks for your comments, Rajani.
It is perfectly obvious that snowball sampling is not representative and non-random, ie, purely convenience sampling. That is not the argument being made here–at all. I’m not making an argument that non-random sampling can lead to representative sampling! And besides, we all know full well that random sampling is also very much subject to bias. It is precisely the homophily bias that you note which makes the approach I’m describing valuable, i.e., the bias that people will recruit those they trust. Honestly, re-reading your comments, I think you’ve completely misunderstood my blog post. Perhaps (or not) the following prequel may help:
You argued in your initial posting in this thread that bounded crowdsourcing and snowball sampling are exciting because they lead to more data in a relatively short period of time. Throughout your writings, you imply that large amounts of data in a small amount of time provide a reliable empirical basis for action. This argument is misguided.
For such sampling methods to be a reliable empirical basis for action, they need to yield valid representations of the affected population or phenomenon of interest. Yet, you readily admit that these methods are hampered by systematic biases (in your recent reply to my earlier comment).
That leaves your main argument then to be “more data, collected cheaply and quickly” will result in insightful humanitarian action. This, unfortunately, ignores the real possibility that crowd-sourcing (or bounded crowd-sourcing) may generate large amounts of information about a small, visible and well-connected population (eg the subpopulation with cell phones who happen to be within the range of undamaged cell phone towers). Such populations are unlikely to be the most vulnerable populations that humanitarian agencies worry most about, nor are they necessarily representative of the entire affected population. While hand-held technologies and improved telecommunication networks are indeed powerful and exciting, they do not automatically provide reliable information. The challenge for the crisis mapping community is to understand what crowd-sourced data is representative of. Solving that puzzle will certainly help to ensure responsible resource allocation and interventions during humanitarian crises. Yet, sadly this remains an area where the crisis mapping community has been unable match their excitement with verifiable evidence.
Again, you are misconstruing my words. Why?
“Throughout your writings, you imply that large amounts of data in a small amount of time provide a reliable empirical basis for action.”
I have never written this and if you continue choosing to misrepresent my writings then I see no point in continuing this conversation. I argue that more data is better than less data, because more data raises the possibility of doing more triangulation. Surely you know this if you have actually read my writings. I have never argued for crowdsourced data via snow ball sampling to be the only source of information that one bases a decision on. That at the end of the day is up to the decision maker her or himself. The majority of health data is based on convenience sampling and yet plenty of decisions are made on this basis. The same is true of emergency 911 calls.
“The challenge for the crisis mapping community is to understand what crowd-sourced data is representative of. ”
Contradiction–crowdsourced data is not random sampling and thus not representative. You also imply that crisis mapping is all about crowdsourcing. Again wrong.
Data does not have to be representative to be actionable and valuable. The vast majority of humanitarian decisions and environmental & public health for that matter are rarely taken based on representative data.
You are forcing the argument towards one of representation which was never the focus of my blog post. Rather it was the idea of trying to improve reliability in information collection over what is possible via open crowdsourcing. You are derailing the conversation to serve your own purpose.
“That leaves your main argument then to be “more data, collected cheaply and quickly” will result in insightful humanitarian action.”
Alright this additional false representation is exactly where I stop arguing with you.
Pingback: Crowdsourcing, mapping and an i-doc; more links from the land of Facebook | i-docs
Pingback: Three Common Misconceptions About Ushahidi | iRevolution
Pingback: Russia: 11 Areas of Election-Related ICT Innovation :: Elites TV
Pingback: Russia: 11 Areas of Election-Related ICT Innovation · Global Voices
Pingback: Innovation and Counter-Innovation: Digital Resistance in Russia | iRevolution
Pingback: Innovation and Counter-Innovation: Digital Resistance in Russia | The Meta-Activism Project
Pingback: Getting the Locals Media and Tech Components Right – The Ushahidi Blog
Pingback: Russia: 11 Areas of Election-Related ICT Innovation | SNID- Master in Social Networks Influence Design
Pingback: 11 Areas of Election-Related Innovation in Russia » OWNI.eu, News, Augmented
Pingback: The Best of iRevolution: Four Years of Blogging | iRevolution
Pingback: Evolution in Live Mapping: The 2012 Egyptian Presidential Elections | iRevolution
Pingback: From Crowdsourcing Crisis Information to Crowdseeding Conflict Zones | iRevolution
Pingback: Traditional vs. Crowdsourced Election Monitoring: Which Has More Impact? | iRevolution
Pingback: GDACSmobile: Disaster Responders Turn to Bounded Crowdsourcing | iRevolution
Pingback: GDACSmobile | daidungsi's blog
Pingback: Using Crowdsourcing to Counter the Spread of False Rumors on Social Media During Crises | iRevolution