Derrick Herris puts it best:
“It might be provocative to call into question one of the hottest tech movements in generations, but it’s not really fair. That’s because how companies and people benefit from Big Data, Data Science or whatever else they choose to call the movement toward a data-centric world is directly related to what they expect going in. Arguing that big data isn’t all it’s cracked up to be is a strawman, pure and simple—because no one should think it’s magic to begin with.”
So here is a list of misplaced assumptions about the relevance of Big Data for disaster response and emergency management:
• “Big Data will improve decision-making for disaster response”
This recent groundbreaking study by the UN confirms that many decisions made by humanitarian professionals during disasters are not based on any kind of empirical data—regardless of how large or small a dataset may be and even when the data is fully trustworthy. In fact, humanitarians often use anecdotal information or mainstream news to inform their decision-making. So no, Big Data will not magically fix these decision-making deficiencies in humanitarian organizations, all of which pre-date the era of Big (Crisis) Data.
• “Big Data suffers from extreme sample bias.”
This is often true of any dataset collected using non-random sampling methods. The statement also seems to suggest that representative sampling methods can actually be carried out just as easily, quickly and cheaply. This is very rarely the case, hence the use of non-random sampling. In other words, sample bias is not some strange disease that only affects Big Data or social media. And even though Big Data is biased and not necessarily objective, Big Data such as social media still represents a “new, large, and arguably unfiltered insights into attitudes and behaviors that were previously difficult to track in the wild.”
Statistical correlations in Big Data do not imply causation; they simply suggest that there may be something worth exploring further. Moreover, data that is collected via non-random, non-representative sampling does not invalidate or devalue the data collected. Much of the data used for medical research, digital disease detection and police work is the product of convenience sampling. Should they dismiss or ignore the resulting data because it is not representative? Of course not.
While the 911 system was set up in 1968, the service and number were not widely known until the 1970s and some municipalities did not have the crowdsourcing service until the 1980s. So it was hardly a representative way to collect emergency calls. Does this mean that the millions of 911 calls made before the more widespread adoption of the service in the 1990s were all invalid or useless? Of course not, even despite the tens of millions of false 911 calls and hoaxes that are made ever year. Point is, there has never been a moment in history in which everyone has had access to the same communication technology at the same time. This is unlikely to change for a while even though mobile phones are by far the most rapidly distributed and widespread communication technology in the history of our species.
There were over 20 million tweets posted during Hurricane Sandy last year. While “only” 16% of Americans are on Twitter and while this demographic is younger, more urban and affluent than the norm, as Kate Crawford rightly notes, this does not render the informative and actionable tweets shared during the Hurricane useless to emergency managers. After Typhoon Pablo devastated the Philippines last year, the UN used images and videos shared on social media as a preliminary way to assess the disaster damage. According to one Senior UN Official I recently spoke with, their relief efforts would have overlooked certain disaster-affected areas had it not been for this map.
Was the data representative? No. Were the underlying images and videos objective? No, they captured the perspective of those taking the pictures. Note that “only” 3% of the world’s population are active Twitter users and fewer still post images and videos online. But the damage captured by this data was not virtual, it was real damage. And it only takes one person to take a picture of a washed-out bridge to reveal the infrastructure damage caused by a Typhoon, even if all other onlookers have never heard of social media. Moreover, this recent statistical study reveals that tweets are evenly geographically distributed according to the availability of electricity. This is striking given that Twitter has only been around for 7 years compared to the light bulb, which was invented 134 years ago.
• “Big Data enthusiasts suggest doing away with traditional sources of information for disaster response.”
I have yet to meet anyone who earnestly believes this. As Derrick writes, “social media shouldn’t usurp traditional customer service or market research data that’s still useful, nor should the Centers for Disease Control start relying on Google Flu Trends at the expense of traditional flu-tracking methodologies. Web and social data are just one more source of data to factor into decisions, albeit a potentially voluminous and high-velocity one.” In other words, the situation is not either/or, but rather a both/and. Big (Crisis) Data from social media can complement rather than replace traditional information sources and methods.
• “Big Data will make us forget the human faces behind the data.”
Big (Crisis) Data typically refers to user-generated content shared on social media, such as Twitter, Instagram, Youtube, etc. Anyone who follows social media during a disaster would be hard-pressed to forget where this data is coming from, in my opinion. Social media, after all, is social and increasingly visually social as witnessed by the tremendous popularity of Instagram and Youtube during disasters. These help us capture, connect and feel real emotions.
Pingback: The Geography of Twitter: Mapping the Global Heartbeat | iRevolution
Pingback: Got a full slide tray? | It's Not My Emergency
WONDERFUL commentary Patrick!! Thank you! One of the key issues with “big data” is that the assumptions are grounded in the idea that “data” is “objective”—-which is what statistical analyses require—as they want to show patterns and trends that are “universal”….which to a sociologist such as my self is a fundamentally erroneous assumption when analyzing social reality and people. This idea of “truth” (and the epistemological assumptions it is informed by) grew out of 18th century debates on what is “science” and the need for objective “truth”.
In the study of the history of science and the “scientific method” we now see this as a form of “ideology” and that the “objective” aspects of science are culturally and historically defined.
What this means today is that “big data” is simply that—“big” as in the ability to gather the “stuff” (data/indicators) of human lives—and that “stuff”—tweets, emails, blogs, etc. should be looked at and analyzed simply for what it is—cultural artifacts that have been created in a specific place at a specific time. That is what makes them interesting—as specific (NOT universal or “objective’) forms of cultural communication.
If we see this kind of “data” as SUBJECTIVE they can then be made meaningful from a completely different theoretical and methodological paradigm…one used by a “qualitative” or subjective model of “truth” and data. In short, to analyze the vast amounts of data that are now available and from a “quantitative” perspective actually loses the depth and power and revelatory characteristics that make this amount of “data” so exciting.
Clearly it is time to change the dominant analytic lens—it is time to change the assumptions the have dominated social analyses for the past 20 years and once again introduce analytic methods that allow us to see more deeply what can be revealed in the new masses of “big data” that are now available…
Thank you for bringing the conversation to the big table…
Wow, your comment is one of the best I’ve received on iRevolution in a long time, many thanks for taking the time to read and to share your very insightful thoughts, Jerri! I’ll tweet your comment.
Thank YOU Patrick…..hmmm, what about a workshop to dig into this? Might be time for a public forum to explore the analytic—and evidential—implications….
• “Big Data suffers from extreme sample bias.”
This is an imprecise statement. “Big Data” has many applications for crisis response and analysis and I don’t think the sample bias criticism applies to every case, however, it is a valid criticism for some cases. This post is correct that non-random samples do not devalue the data collected; however, convenience sampling may compromise subsequent analysis of and inferences drawn from that data.
For instance, “Big Data” algorithms can reveal trends in consumer behavior (i.e. at Walmart) when populations are threatened by natural disaster. In this case, the store has an inventory of all items stocked and has a clear record of every transaction and therefore data analysts have access to the entire population, not just a sample, of sales events. Most of the popular conflict-event data sets I am aware of consists of a very small sample of a MUCH larger population of actual conflict-events where the spatial and temporal location of events is often quite uncertain (for Walmart data there is no uncertainty about where, when, or how much of an item was sold).
Therefore, I would caution against applying the same type of (marketing?) algorithm for analysis or forecasting of political or ethnic violence across space and/or time. The data sets are indeed quite valuable; I just think that analysts need to be careful about inferences based on non-probability samples of this type.
Thanks for reading, John, and for sharing your thoughts.
Pingback: Floods in Germany and Use of Social Media | geosocialite
Pingback: What is Big (Crisis) Data? | iRevolution
Dr. Meier this is a really interesting article. I’m about to start working on a quantitative research project regarding EM and was curious if you knew of any available data sets to the public about EM and social media.
Any advice you could give or direction you could point me in would be terrific! Thanks so much.
Thanks for reading, Edwin. This public dataset has 20 million tweets from Hurricane Sandy:
Hope his helps!
Pingback: On the need for research on Citizen’s data, big and small | geosocialite
Pingback: Big Data & Disaster Response: Even More Wrong Assumptions | iRevolution
Pingback: Should Humanitarian Intelligence exist? | bertrand taithe