The use of crowdsourcing may be relatively new to the technology, business and humanitarian sectors but when it comes to statistics, crowdsourcing is a well known and established sampling method. Crowdsourcing is just non-probability sampling. The crowdsourcing of crisis information is simply an application of non-probability sampling.
Lets first review probability sampling in which every unit in the population being sampled has a known probability (greater than zero) of being selected. This approach makes it possible to “produce unbiased estimates of population totals, by weighting sampled units according to their probability selection.”
Non-probability sampling, on the other hand, describes an approach in which some units of the population have no chance of being selected or where the probability of selection cannot be accurately determined. An example is convenience sampling. The main drawback of non-probability sampling techniques is that “information about the relationship between sample and population is limited, making it difficult to extrapolate from the sample to the population.”
There are several advantages, however. First, non-probability sampling is a quick way to collect way to collect and analyze data in range of settings with diverse populations. The approach is also a “cost-efficient means of greatly increasing the sample, thus enabling more frequent measurement.” In some cases, the non-probability sampling may actually be the only approach available—a common constrain in a lot of research, including many medical studies, not to mention Ushahidi Haiti. The method is also used in exploratory research, e.g., for hypothesis generation, especially when attempting to determine whether a problem exists or not.
The point is that non-probability sampling can save lives, many lives. Much of the data used for medical research is the product of convenience sampling. When you see your doctor, or you’re hospitalized, that is not a representative sample. Should the medical field throw away all this data based on the fact that it constitutes non-probability sampling. Of course not, that would be ludicrous.
The notion of bounded crowdsourcing, which I blogged about here, is also a known sampling technique called purposive sampling. This approach involves targeting experts or key informants. Snowball sampling is another type of non-probability sampling, which may also be applied to crowdsource of crisis information.
In snowball sampling, you begin by identifying someone who meets the criteria for inclusion in your study. You then ask them to recommend others who they may know who also meet the criteria. Although this method would hardly lead to representative samples, there are times when it may be the best method available. Snowball sampling is especially useful when you are trying to reach populations that are inaccessible or hard to find.
A project like Mission 4636 and Ushahidi-Haiti could take advantage of this approach by using two-way SMS communication to ask respondents to spread the word. Individuals who sent in text messages about persons trapped under the rubble could (later) be sent an SMS asking them to share the 4636 short code with people who may know of other trapped individuals. When the humanitarian response began to scale during the search and rescue operations, purposive sampling using UN personnel could also have been implemented.
In contrast to non-probability sampling techniques, probability sampling often requires considerable time and extensive resources. Furthermore, non-response effects can easily turn any probability design into non-probability sampling if the “characteristics of non-response are not well understood” since these modify each unit’s probability of being sampled.
This is not to suggest that one approach is better than the other since this depends entirely on the context and research question.