Derrick Herris puts it best:
“It might be provocative to call into question one of the hottest tech movements in generations, but it’s not really fair. That’s because how companies and people benefit from Big Data, Data Science or whatever else they choose to call the movement toward a data-centric world is directly related to what they expect going in. Arguing that big data isn’t all it’s cracked up to be is a strawman, pure and simple—because no one should think it’s magic to begin with.”
So here is a list of misplaced assumptions about the relevance of Big Data for disaster response and emergency management:
• “Big Data will improve decision-making for disaster response”
This recent groundbreaking study by the UN confirms that many decisions made by humanitarian professionals during disasters are not based on any kind of empirical data—regardless of how large or small a dataset may be and even when the data is fully trustworthy. In fact, humanitarians often use anecdotal information or mainstream news to inform their decision-making. So no, Big Data will not magically fix these decision-making deficiencies in humanitarian organizations, all of which pre-date the era of Big (Crisis) Data.
• “Big Data suffers from extreme sample bias.”
This is often true of any dataset collected using non-random sampling methods. The statement also seems to suggest that representative sampling methods can actually be carried out just as easily, quickly and cheaply. This is very rarely the case, hence the use of non-random sampling. In other words, sample bias is not some strange disease that only affects Big Data or social media. And even though Big Data is biased and not necessarily objective, Big Data such as social media still represents a “new, large, and arguably unfiltered insights into attitudes and behaviors that were previously difficult to track in the wild.”
Statistical correlations in Big Data do not imply causation; they simply suggest that there may be something worth exploring further. Moreover, data that is collected via non-random, non-representative sampling does not invalidate or devalue the data collected. Much of the data used for medical research, digital disease detection and police work is the product of convenience sampling. Should they dismiss or ignore the resulting data because it is not representative? Of course not.
While the 911 system was set up in 1968, the service and number were not widely known until the 1970s and some municipalities did not have the crowdsourcing service until the 1980s. So it was hardly a representative way to collect emergency calls. Does this mean that the millions of 911 calls made before the more widespread adoption of the service in the 1990s were all invalid or useless? Of course not, even despite the tens of millions of false 911 calls and hoaxes that are made ever year. Point is, there has never been a moment in history in which everyone has had access to the same communication technology at the same time. This is unlikely to change for a while even though mobile phones are by far the most rapidly distributed and widespread communication technology in the history of our species.
There were over 20 million tweets posted during Hurricane Sandy last year. While “only” 16% of Americans are on Twitter and while this demographic is younger, more urban and affluent than the norm, as Kate Crawford rightly notes, this does not render the informative and actionable tweets shared during the Hurricane useless to emergency managers. After Typhoon Pablo devastated the Philippines last year, the UN used images and videos shared on social media as a preliminary way to assess the disaster damage. According to one Senior UN Official I recently spoke with, their relief efforts would have overlooked certain disaster-affected areas had it not been for this map.
Was the data representative? No. Were the underlying images and videos objective? No, they captured the perspective of those taking the pictures. Note that “only” 3% of the world’s population are active Twitter users and fewer still post images and videos online. But the damage captured by this data was not virtual, it was real damage. And it only takes one person to take a picture of a washed-out bridge to reveal the infrastructure damage caused by a Typhoon, even if all other onlookers have never heard of social media. Moreover, this recent statistical study reveals that tweets are evenly geographically distributed according to the availability of electricity. This is striking given that Twitter has only been around for 7 years compared to the light bulb, which was invented 134 years ago.
• “Big Data enthusiasts suggest doing away with traditional sources of information for disaster response.”
I have yet to meet anyone who earnestly believes this. As Derrick writes, “social media shouldn’t usurp traditional customer service or market research data that’s still useful, nor should the Centers for Disease Control start relying on Google Flu Trends at the expense of traditional flu-tracking methodologies. Web and social data are just one more source of data to factor into decisions, albeit a potentially voluminous and high-velocity one.” In other words, the situation is not either/or, but rather a both/and. Big (Crisis) Data from social media can complement rather than replace traditional information sources and methods.
• “Big Data will make us forget the human faces behind the data.”
Big (Crisis) Data typically refers to user-generated content shared on social media, such as Twitter, Instagram, Youtube, etc. Anyone who follows social media during a disaster would be hard-pressed to forget where this data is coming from, in my opinion. Social media, after all, is social and increasingly visually social as witnessed by the tremendous popularity of Instagram and Youtube during disasters. These help us capture, connect and feel real emotions.