Category Archives: Crowdsourcing

The World at Night Through the Eyes of the Crowd

Ushahidi has just uploaded the location of all CrowdMap reports to DevSeed’s awesome MapBox and the result looks gorgeous. Click this link to view the map below in an interactive, full-browser window. Ushahidi doesn’t disclose the actual number of reports depicted, only the number of maps that said reports have been posted to and the number of countries that CrowdMaps have been launched for. But I’m hoping they’ll reveal that figure soon as well. (Update from Ushahidi: This map shows the 246,323 unique locations used for reports from the launch of Crowdmap on Aug 9, 2010 to Jan 18, 2013).

Screen Shot 2013-02-06 at 3.10.38 AM

In any event, I’ve just emailed my colleagues at Ushahidi to congratulate them and ask when their geo-dataset will be made public since they didn’t include a link to said dataset in their recent blog post. I’ll be sure to let readers know in the comments section as soon as I get a reply. There are a plethora of fascinating research questions that this dataset could potentially help us answer. I’m really excited and can’t wait for my team and I at QCRI to start playing with the data. I’d also love to see this static map turned into a live map; one that allows users to actually click on individual reports as they get posted to a CrowdMap and to display the category (or categories) they’ve been tagged with. Now that would be just be so totally über cool—especially if/when Ushahidi opens up that data to the public, even if at a spatially & temporally aggregated level.

For more mesmerizing visualizations like this one, see my recent blog post entitled “Social Media: Pulse of the Planet?” which is also cross-posted on the National Geographic blog here. In the meantime, I’m keeping my fingers crossed that Ushahidi will embrace an Open Data policy from here on out and highly recommend the CrowdGlobe Report to readers interested in learning more about CrowdMap and Ushahidi.

bio

Map or Be Mapped: Otherwise You Don’t Exist

“There are hardly any street signs here. There are no official zip codes. No addresses. Just word of mouth” (1). Such is the fate of Brazil’s Mare shanty-town and that of most shantytowns around the world where the spoken word is king (and not necessarily benevolent). “The sprawling complex of slums, along with the rest of Rio de Janerio’s favelas, has hung in a sort of ‘legal invisibility’ since 1937, when a city ordinance ruled that however unsightly, favelas should be kept off maps because they were merely ‘temporary'” (2).

shantytown

The socio-economic consequences were far-reaching. For decades, this infor-mality meant that “entire neighborhoods did not receive mail. It had also blocked people from giving required information on job applications, getting a bank account or telling the police or fire department where to go in an emergency call. Favela residents had to pick up their mail from their neighborhood associations, and entire slums housing a small town’s worth of residents had to use the zip code of the closest officially recognized street” (3).

All this is starting to change thanks to a grassroots initiative that is surveying Mare’s 16 favelas, home to some 130,000 people. This community-driven project has appropriated the same survey methodology used by the Brazilian government’s Institute of Geography and Statistics. The collected data includes “not only street names but the history of the original smaller favelas that make up the community” (4). This data is then “formatted into pocket guides and distributed gratis to residents. These guides also offer background on certain streets’ namesakes, but leave some blank so that residents can fill them in as Mare […] continues shifting out from the shadows of liminal space to a city with distinct identities” (5). And so, “residents of Rio’s famed favelas are undergoing their first real and ‘fundamental step toward citizenship'” (6).

These bottom-up, counter-mapping efforts are inherently political—call it guerrilla mapping. Traditionally, maps have represented “not just the per-spective of the cartographer herself, but of much larger institutions—of corporations, organizations, and governments” (7). The scale was fixed at one and only one scale, that of the State. Today, informal communities can take matters into their own hands and put themselves on the map; at the scale of their choosing. But companies like Google still have the power to make these communities vanish. In Brazil, Google said it “would tweak the site’s [Google Maps’] design, namely its text size and district labeling to show favela names only after users zoomed in on those areas.”

GmapNK

Meanwhile, Google is making North Korea’s capital city more visible. But I had an uncomfortable feeling after reading National Geographic’s take on Google’s citizen mapping expedition to North Korea. The Director for National Geographic Maps, Juan José Valdéscautions that, “In many parts of the world such citizen mapping has proven challenging, if not downright dangerous. In many places, little can be achieved without the approval of local and or national authorities—especially in North Korea.” Yes, but in many parts of the world citizen mapping is safe and possible. More importantly, citizen mapping can be a powerful tool for digital activism. My entire doctoral dissertation focuses on exactly this issue.

Yes, Valdés is absolutely correct when he writes that “In many countries, place-names, let alone the alignment of boundaries, remain a powerful symbol of independence and national pride, and not merely indicators of location. This is where citizen cartographers need to understand the often subtle nuances and potential pitfalls of mapping.” As the New Yorker notes, “Maps are so closely associated with power that dictatorships regard information on geography as a state secret.” But map-savvy digital activists already know this better than most, and they deliberately seek to exploit this to their advantage in their struggles for democracy.

National Geographic’s mandate is of course very different. “From National Geographic’s perspective, all a map should accomplish is the actual portrayal of national sovereignty, as it currently exists. It should also reflect the names as closely as possible to those recognized by the political entities of the geographic areas being mapped. To do otherwise would give map readers an unrealistic picture of what is occurring on the ground.”

natgeomaps

This makes perfect sense for National Geographic. But as James Scott reminds us in his latest book, “A great deal of the symbolic work of official power is precisely to obscure the confusion, disorder, spontaneity, error, and improvisation of political power as it is in fact exercised, beneath a billiard-ball-smooth surface of order, deliberation, rationality, and control. I think of this as the ‘miniaturization of order.'” Scott adds that, “The order, rationality, abstractness and synoptic legibility of certain kinds of schemes of naming, landscape, architecture, and work processes lend themselves to hierarchical power […] ‘landscapes of control and appropriation.'”

Citizen mapping, especially in repressive environments, often seeks to change that balance of power by redirecting the compass of political power with the  use of subversive digital maps. Take last year’s example of Syrian pro-democracy activists changing place & street names depicted on on the Google Map of Syria. They did this intentionally as an act of resistance and defiance. Again, I fully understand and respect that National Geographic’s mandate is completely different to that of pro-democracy activists fighting for freedom. I just wish that Valdés had a least added one sentence to acknowledge the importance of maps for the purposes of resistance and pro-democracy movements. After all, he is himself a refugee from Cuba’s political repression.

There is of course a flip side to all this. While empowering, visibility and legibility can also undermine a community’s autonomy. As Pierre-Joseph Proudhon famously put it, “To be governed is to be watched, inspected, spied upon, directed, law-driven, numbered, regulated, enrolled, indoctrinated, preached at, controlled, checked, estimated, valued, censured, commanded, by creatures who have neither the right nor the wisdom nor the virtue to do so.” To be digitally mapped is to be governed, but perhaps at multiple scales including the preferred scale of self-governance and self-determination.

And so, we find ourselves repeating the words of Shakespeare’s famous character Hamlet: “To be, or not to be,” to map, or not to map.

 

See also:

  • Spying with Maps [Link]
  • How to Lie With Maps [Link]
  • Folksomaps for Community Mapping [Link]
  • From Social Mapping to Crisis Mapping [Link]
  • Crisis Mapping Somalia with the Diaspora [Link]
  • Perils of Crisis Mapping: Lessons from Gun Map [Link]
  • Crisis Mapping the End of Sudan’s Dictatorship? [Link]
  • Threat and Risk Mapping Analysis in the Sudan [Link]
  • Rise of Amateur Professionals & Future of Crisis Mapping [Link]
  • Google Inc + World Bank = Empowering Citizen Cartographers? [Link]

Note: Readers interested in the topics discussed above may also be interested in a forthcoming book to be published by Oxford University Press entitled “Information and Communication Technologies in Areas of Limited State-hood.” I have contributed a chapter to this book entitled “Crisis Mapping in Areas of Limited Statehood,” which analyzes how the rise of citizen-genera-ted crisis mapping replaces governance in areas of limited statehood. The chapter distills the conditions for the success of these crisis mapping efforts in these non-permissive and resource-restricted environments. 

Why Ushahidi Should Embrace Open Data

“This is the report that Ushahidi did not want you to see.” Or so the rumors in certain circles would have it. Some go as far as suggesting that Ushahidi tried to burry or delay the publication. On the other hand, some rumors claim that the report was a conspiracy to malign and discredit Ushahidi. Either way, what is clear is this: Ushahidi is an NGO that prides itself in promoting transparency & accountability; an organization prepared to take risks—and yes fail—in the pursuit of this  mission.

The report in question is CrowdGlobe: Mapping the Maps. A Meta-level Analysis of Ushahidi & Crowdmap. Astute observers will discover that I am indeed one of the co-authors. Published by Internews in collaboration with George Washington University, the report (PDF) reveals that 93% of 12,000+ Crowdmaps analyzed had fewer than 10 reports while a full 61% of Crowdmaps had no reports at all. The rest of the findings are depicted in the infographic below (click to enlarge) and eloquently summarized in the above 5-minute presentation delivered at the 2012 Crisis Mappers Conference (ICCM 2012).

Infographic_2_final (2)

Back in 2011, when my colleague Rob Baker (now with Ushahidi) generated the preliminary results of the quantitative analysis that underpins much of the report, we were thrilled to finally have a baseline against which to measure and guide the future progress of Ushahidi & Crowdmap. But when these findings were first publicly shared (August 2012), they were dismissed by critics who argued that the underlying data was obsolete. Indeed, much of the data we used in the analysis dates back to 2010 and 2011. Far from being obsolete, however, this data provides a baseline from which the use of the platform can be measured over time. We are now in 2013 and there are apparently 36,000+ Crowdmaps today rather than just 12,000+.

To this end, and as a member of Ushahidi’s Advisory Board, I have recommended that my Ushahidi colleagues run the same analysis on the most recent Crowdmap data in order to demonstrate the progress made vis-a-vis the now-outdated public baseline. (This analysis takes no more than an hour a few days to carry out). I also strongly recommend that all this anonymized meta-data be made public on a live dashboard in the spirit of open data and transparency. Ushahidi, after all, is a public NGO funded by some of the biggest proponents of open data and transparency in the world.

Embracing open data is one of the best ways for Ushahidi to dispel the harmful rumors and conspiracy theories that continue to swirl as a result of the Crowd-Globe report. So I hope that my friends at Ushahidi will share their updated analysis and live dashboard in the coming weeks. If they do, then their bold support of this report and commitment to open data will serve as a model for other organizations to emulate. If they’ve just recently resolved to make this a priority, then even better.

In the meantime, I look forward to collaborating with the entire Ushahidi team on making the upcoming Kenyan elections the most transparent to date. As referenced in this blog post, the Standby Volunteer Task Force (SBTF) is partnering with the good people at PyBossa to customize an awesome micro-tasking platform that will significantly facilitate and accelerate the categorization and geo-location of reports submitted to the Ushahidi platform. So I’m working hard with both of these outstanding teams to make this the most successful, large-scale microtasking effort for election monitoring yet. Now lets hope for everyone’s sake that the elections remain peaceful. Onwards!

Social Media: Pulse of the Planet?

In 2010, Hillary Clinton described social media as a new nervous system for our planet (1). So can the pulse of the planet be captured with social media? There are many who are skeptical not least because of the digital divide. “You mean the pulse of the Data Have’s? The pulse of the affluent?” These rhetorical questions are perfectly justified, which is why social media alone should not be the sole source of information that feeds into decision-making for policy purposes. But millions are joining the social media ecosystem everyday. So the selection bias is not increasing but decreasing. We may not be able to capture the pulse of the planet comprehensively and at a very high resolution yet, but the pulse of the majority world is certainly growing louder by the day.

mapnight2

This map of the world at night (based on 2011 data) reveals areas powered by electricity. Yes, Africa has far less electricity consumption. This is not misleading, it is an accurate proxy for industrial development (amongst other indexes). Does this data suffer from selection bias? Yes, the data is biased towards larger cities rather than the long tail. Does this render the data and map useless? Hardly. It all depends on what the question is.

Screen Shot 2013-02-02 at 8.22.49 AM

What if our world was lit up by information instead of lightbulbs? The map above from TweetPing does just that. The website displays tweets in real-time as they’re posted across the world. Strictly speaking, the platform displays 10% of the ~340 million tweets posted each day (i.e., the “Decahose” rather than the “Firehose”). But the volume and velocity of the pulsing ten percent is already breathtaking.

Screen Shot 2013-01-28 at 7.01.36 AM

One may think this picture depicts electricity use in Europe. Instead, this is a map of geo-located tweets (blue dots) and Flickr pictures (red dots). “White dots are locations that have been posted to both” (2). The number of active Twitter users grew an astounding 40% in 2012, making Twitter the fastest growing social network on the planet. Over 20% of the world’s internet population is now on Twitter (3). The Sightsmap below is a heat map based on the number of photographs submitted to Panoramio at different locations.

Screen Shot 2013-02-05 at 7.59.37 AM

The map below depicts friendship ties on Facebook. This was generated using data when there were “only” 500 million users compared to today’s 1 billion+.

FBmap

The following map does not depict electricity use in the US or the distribution of the population based on the most recent census data. Instead, this is a map of check-in’s on Foursquare. What makes this map so powerful is not only that it was generated using 500 million check-in’s but that “all those check-ins you see aren’t just single points—they’re links between all the other places people have been.”

FoursquareMap

TwitterBeat takes the (emotional) pulse of the planet by visualizing the Twitter Decahose in real-time using sentiment analysis. The crisis map in the YouTube video below comprises all tweets about Hurricane Sandy over time. “[Y]ou can see how the whole country lights up and how tweets don’t just move linearly up the coast as the storm progresses, capturing the advance impact of such a large storm and its peripheral effects across the country” (4).


These social media maps don’t only “work” at the country level or for Western industrialized states. Take the following map of Jakarta made almost exclusively from geo-tagged tweets. You can see the individual roads and arteries (nervous system). Granted, this map works so well because of the horrendous traffic but nevertheless a pattern emerges, one that is strongly correlated to the Jakarta’s road network. And unlike the map of the world at night, we can capture this pulse in real time and at a fraction of the cost.

Jakmap

Like any young nervous system, our social media system is still growing and evolving. But it is already adding value. The analysis of tweets predicts the flu better than the crunching of traditional data used by public health institutions, for example. And the analysis of tweets from Indonesia also revealed that Twitter data can be used to monitor food security in real-time.

The main problem I see about all this has much less to do with issues of selection bias and unrepresentative samples, etc. Far more problematic is the central-ization of this data and the fact that it is closed data. Yes, the above maps are public, but don’t be fooled, the underlying data is not. In their new study, “The Politics of Twitter Data,” Cornelius Puschmann and Jean Burgess argue that the “owners” of social media data are the platform providers, not the end users. Yes, access to Twitter.com and Twitter’s API is free but end users are limited to downloading just a few thousand tweets per day. (For comparative purposes, more than 20 million tweets were posted during Hurricane Sandy). Getting access to more data can cost hundreds of thousands of dollars. In other words, as Puschmann and Burgess note, “only corporate actors and regulators—who possess both the intellectual and financial resources to succeed in this race—can afford to participate,” which means “that the emerging data market will be shaped according to their interests.”

“Social Media: Pulse of the Planet?” Getting there, but only a few elite Doctors can take the full pulse in real-time.

Using #Mythbuster Tweets to Tackle Rumors During Disasters

The massive floods that swept through Queensland, Australia in 2010/2011 put an area almost twice the size of the United Kingdom under water. And now, a year later, Queensland braces itself for even worse flooding:

Screen Shot 2013-01-26 at 11.38.38 PM

More than 35,000 tweets with the hashtag #qldfloods were posted during the height of the flooding (January 10-16, 2011). One of the most active Twitter accounts belonged to the Queensland Police Service Media Unit: @QPSMedia. Tweets from (and to) the Unit were “overwhelmingly focussed on providing situational information and advice” (1). Moreover, tweets between @QPSMedia and followers were “topical and to the point, significantly involving directly affected local residents” (2). @QPSMedia also “introduced innovations such as the #Mythbuster series of tweets, which aimed to intervene in the spread of rumor and disinformation” (3).

rockhampton floods 2011

On the evening of January 11, @QPSMedia began to post a series of tweets with #Mythbuster in direct response to rumors and misinformation circulating on Twitter. Along with official notices to evacuate, these #Mythbuster tweets were the most widely retweeted @QPSMedia messages.” They were especially successful. Here is a sample: “#mythbuster: Wivenhoe Dam is NOT about to collapse! #qldfloods”; “#mythbuster: There is currently NO fuel shortage in Brisbane. #qldfloods.”

Screen Shot 2013-01-27 at 12.19.03 AM

This kind of pro-active intervention reminds me of the #fakesandy hashtag used during Hurricane Sandy and FEMA’s rumor control initiative during Hurricane Sandy. I expect to see greater use of this approach by professional emergency responders in future disasters. There’s no doubt that @QPSMedia will provide this service again with the coming floods and it appears that @QLDonline is already doing so (above tweet). Brisbane’s City Council has also launched this Crowdmap marking latest road closures, flood areas and sandbag locations. Hoping everyone in Queensland stays safe!

In the meantime, here are some relevant statistics on the crisis tweets posted during the 2010/2011 floods in Queensland:

  • 50-60% of #qldfloods messages were retweets (passing along existing messages, and thereby  making them more visible); 30-40% of messages contained links to further information elsewhere on the Web.
  • During the crisis, a number of Twitter users dedicated themselves almost exclusively to retweeting #qldfloods messages, acting as amplifiers of emergency information and thereby increasing its reach.
  • #qldfloods tweets largely managed to stay on topic and focussed predominantly on sharing directly relevant situational information, advice, news media and multimedia reports.
  • Emergency services and media organisations were amongst the most visible participants in #qldfloods, especially also because of the widespread retweeting of their messages.
  • More than one in every five shared links in the #qldfloods dataset was to an image hosted on one of several image-sharing services; and users overwhelmingly depended on Twitpic and other Twitter-centric image-sharing services to upload and distribute the photographs taken on their smartphones and digital cameras
  • The tenor of tweets during the latter days of the immediate crisis shifted more strongly towards organising volunteering and fundraising efforts: tweets containing situational information and advice, and news media and multimedia links were retweeted disproportionately often.
  • Less topical tweets were far less likely to be retweeted.

Social Network Analysis for Digital Humanitarian Response

Monitoring social media for digital humanitarian response can be a massive undertaking. The sheer volume and velocity of tweets generated during a disaster makes real-time social media monitoring particularly challenging if not near impossible. However, two new studies argue that there is “a better way to track the spread of information on Twitter that is much more powerful.”

Twitter-Hadoop31

Manuel Garcia-Herranz and his team at the Autonomous University of Madrid in Spain use small groups of “highly connected Twitter users as ‘sensors’ to detect the emergence of new ideas. They point out that this works because highly co-nnected individuals are more likely to receive new ideas before ordinary users.” The test their hypothesis, the team studied 40 million Twitters users who “together totted up 1.5 billion follows’ and sent nearly half a billion tweets, including 67 million containing hashtags.”

They found that small groups of highly connected Twitter users detect “new hashtags about seven days earlier than the control group.  In fact, the lead time varied between nothing at all and as much as 20 days.” Manuel and his team thus argue that “there’s no point in crunching these huge data sets. You’re far better off picking a decent sensor group and watching them instead.” In other words, “your friends could act as an early warning system, not just for gossip, but for civil unrest and even outbreaks of disease.”

The second study, “Identifying and Characterizing User Communities on Twitter during Crisis Events,” (PDF) is authored by Aditi Gupta et al. Aditi and her co-lleagues analyzed three major crisis events (Hurricane Irene, Riots in England and Earthquake in Virginia) to “to identify the different user communities, and characterize them by the top central users.” Their findings are in line with those shared by the team in Madrid. “[T]he top users represent the topics and opinions of all the users in the community with 81% accuracy on an average.” In sum, “to understand a community, we need to monitor and analyze only these top users rather than all the users in a community.”

How could these findings be used to prioritize the monitoring of social media during disasters? See this blog post for more on the use of social network analysis (SNA) for humanitarian response.

Digital Humanitarian Response: Moving from Crowdsourcing to Microtasking

A central component of digital humanitarian response is the real-time monitor-ing, tagging and geo-location of relevant reports published on mainstream and social media. This has typically been a highly manual and time-consuming process, which explains why dozens if not hundreds of digital volunteers are often needed to power digital humanitarian response efforts. To coordinate these efforts, volunteers typically work off Google Spreadsheets which, needless to say, is hardly the most efficient, scalable or enjoyable interface to work on for digital humanitarian response.

complicated128

The challenge here is one of design. Google Spreadsheets was simply not de-signed to facilitate real-time monitoring, tagging and geo-location tasks by hundreds of digital volunteers collaborating synchronously and asynchronously across multiple time zones. The use of Google Spreadsheets not only requires up-front training of volunteers but also oversight and management. Perhaps the most problematic feature of Google Spreadsheets is the interface. Who wants to spend hours staring at cells, rows and columns? It is high time we take a more volunteer-centered design approach to digital humanitarian response. It is our responsibility to reduce the “friction” and make it as easy, pleasant and re-warding as possible for digital volunteers to share their time for the better good. While some deride the rise of “single-click activism,” we have to make it as easy as a double-click-of-the-mouse to support digital humanitarian efforts.

This explains why I have been actively collaborating with my colleagues behind the free & open-source micro-tasking platform, PyBossa. I often describe micro-tasking as “smart crowdsourcing”. Micro-tasking is simply the process of taking a large task and breaking it down into a series of smaller tasks. Take the tagging and geo-location of disaster tweets, for example. Instead of using Google Spread-sheets, tweets with designated hashtags can be imported directly into PyBossa where digital volunteers can tag and geo-locate said tweets as needed. As soon as they are processed, these tweets can be pushed to a live map or database right away for further analysis.

Screen Shot 2012-12-18 at 5.00.39 PM

The Standby Volunteer Task Force (SBTF) used PyBossa in the digital disaster response to Typhoon Pablo in the Philippines. In the above example, a volunteer goes to the PyBossa website and is presented with the next tweet. In this case: “Surigao del Sur: relief good infant needs #pabloPH [Link] #ReliefPH.” If a tweet includes location information, e.g., “Surigao del Sur,” a digital volunteer can simply copy & paste that information into the search box or  pinpoint the location in question directly on the map to generate the GPS coordinates. Click on the screenshot above to zoom in.

The PyBossa platform presents a number of important advantages when it comes to digital humanitarian response. One advantage is the user-friendly tutorial feature that introduces new volunteers to the task at hand. Furthermore, no prior experience or additional training is required and the interface itself can be made available in multiple languages. Another advantage is the built-in quality control mechanism. For example, one can very easily customize the platform such that every tweet is processed by 2 or 3 different volunteers. Why would we want to do this? To ensure consensus on what the right answers are when processing a tweet. For example, if three individual volunteers each tag a tweet as having a link that points to a picture of the damage caused by Typhoon Pablo, then we may find this to be more reliable than if only one volunteer tags a tweet as such. One additional advantage of PyBossa is that having 100 or 10,000 volunteers use the platform doesn’t require additional management and oversight—unlike the use of Google Spreadsheets.

There are many more advantages of using PyBossa, which is why my SBTF colleagues and I are collaborating with the PyBossa team with the ultimate aim of customizing a standby platform specifically for digital humanitarian response purposes. As a first step, however, we are working together to customize a PyBossa instance for the upcoming elections in Kenya since the SBTF was activated by Ushahidi to support the election monitoring efforts. The plan is to microtask the processing of reports submitted to Ushahidi in order to significantly accelerate and scale the live mapping process. Stay tuned to iRevolution for updates on this very novel initiative.

crowdflower-crowdsourcing-site

The SBTF also made use of CrowdFlower during the response to Typhoon Pablo. Like PyBossa, CrowdFlower is a micro-tasking platform but one developed by a for-profit company and hence primarily geared towards paying workers to complete tasks. While my focus vis-a-vis digital humanitarian response has chiefly been on (integrating) automated and volunteer-driven micro-tasking solutions, I believe that paid micro-tasking platforms also have a critical role to play in our evolving digital humanitarian ecosystem. Why? CrowdFlower has an unrivaled global workforce of more than 2 million contributors along with rigor-ous quality control mechanisms.

While this solution may not scale significanlty given the costs, I’m hoping that CrowdFlower will offer the Digital Humanitarian Network (DHN) generous discounts moving forward. Either way, identifying what kinds of tasks are best completed by paid workers versus motivated volunteers is a questions we must answer to improve our digital humanitarian workflows. This explains why I plan to collaborate with CrowdFlower directly to set up a standby platform for use by members of the Digital Humanitarian Network.

There’s one major catch with all microtasking platforms, however. Without well-designed gamification features, these tools are likely to have a short shelf-life. This is true of any citizen-science project and certainly relevant to digital human-itarian response as well, which explains why I’m a big, big fan of Zooniverse. If there’s a model to follow, a holy grail to seek out, then this is it. Until we master or better yet partner with the talented folks at Zooniverse, we’ll be playing catch-up for years to come. I will do my very best to make sure that doesn’t happen.

The Problem with Crisis Informatics Research

My colleague ChaTo at QCRI recently shared some interesting thoughts on the challenges of crisis informatics research vis-a-vis Twitter as a source of real-time data. The way he drew out the issue was clear, concise and informative. So I’ve replicated his diagram below.

ChaTo Diagram

What Emergency Managers Need: Those actionable tweets that provide situational awareness relevant to decision-making. What People Tweet: Those tweets posted during a crisis which are freely available via Twitter’s API (which is a very small fraction of the Twitter Firehose). What Computers Can Do: The computational ability of today’s algorithms to parse and analyze natural language at a large scale.

A: The small fraction of tweets containing valuable information for emergency responders that computer systems are able to extract automatically.
B: Tweets that are relevant to disaster response but are not able to be analyzed in real-time by existing algorithms due to computational challenges (e.g. data processing is too intensive, or requires artificial intelligence systems that do not exist yet).
C: Tweets that can be analyzed by current computing systems, but do not meet the needs of emergency managers.
D: Tweets that, if they existed, could be analyzed by current computing systems, and would be very valuable for emergency responders—but people do not write such tweets.

These limitations are not just academic. They make it more challenging to develop next-generation humanitarian technologies. So one question that naturally arises is this: How can we expand the size of A? One way is for governments to implement policies that expand access to mobile phones and the Internet, for example.

Area C is where the vast majority of social media companies operate today, on collecting business intelligence and sentiment analysis for private sector companies by combining natural language processing and machine learning methodologies. But this analysis rarely focuses on tweets posted during a major humanitarian crisis. Reaching out to these companies to let them know they could make a difference during disasters would help to expand the size of A + C.

Finally, Area D is composed of information that would be very valuable for emergency responders, and that could automatically extracted from tweets, but that Twitter users are simply not posting this kind of information during emergencies (for now). Here, government and humanitarian organizations can develop policies to incentivise disaster-affected communities to tweet about the impact of a hazard and resulting needs in a way that is actionable, for example. This is what the Philippine Government did during Typhoon Pablo.

Now recall that the circle “What People Tweet About” is actually a very small fraction of all posted tweets. The advantage of this small sample of tweets is that they are freely available via Twitter’s API. But said API limits the number of downloadable tweets to just a few thousand per day. (For comparative purposes, there were over 20 million tweets posted during Hurricane Sandy). Hence the need for data philanthropy for humanitarian response.

I would be grateful for your feedback on these ideas and the conceptual frame-work proposed by ChaTo. The point to remember, as noted in this earlier post, is that today’s challenges are not static; they can be addressed and overcome to various degrees. In other words, the sizes of the circles can and will change.

 

 

Social Network Analysis of Tweets During Australia Floods

This study (PDF) analyzes the community of Twitter users who disseminated  information during the crisis caused by the Australian floods in 2010-2011. “In times of mass emergencies, a phenomenon known as collective behavior becomes apparent. It consists of socio-behaviors that include intensified information search and information contagion.” The purpose of the Australian floods analysis is to reveal interesting patterns and features of this online community using social network analysis (SNA).

The authors analyzed 7,500 flood-related tweets to understand which users did the tweeting and retweeting. This was done to create nodes and links for SNA, which was able to “identify influential members of the online communities that emerged during the Queensland, NSW and Victorian floods as well as identify important resources being referred to. The most active community was in Queensland, possibly induced by the fact that the floods were orders of mag-nitude greater than in NSW and Victoria.”

The analysis also confirmed “the active part taken by local authorities, namely Queensland Police, government officials and volunteers. On the other hand, there was not much activity from local authorities in the NSW and Victorian floods prompting for the greater use of social media by the authorities concerned. As far as the online resources suggested by users are concerned, no sensible conclusion can be drawn as important ones identified were more of a general nature rather than critical information. This might be comprehensible as it was past the impact stage in the Queensland floods and participation was at much lower levels in the NSW and Victorian floods.”

Social Network Analysis is an under-utilized methodology for the analysis of communication flows during humanitarian crises. Understanding the topology of a social network is key to information diffusion. Think of this as a virus infecting a network. If we want to “infect” a social network with important crisis information as quickly and fully as possible, understanding the network’ topology is a requirement as is, therefore, social network analysis.

Why the Public Does (and Doesn’t) Use Social Media During Disasters

The University of Maryland has just published an important report on “Social Media Use During Disasters: A Review of the Knowledge Base and Gaps” (PDF). The report summarizes what is empirically known and yet to be determined about social media use pertaining to disasters. The research found that members of the public use social media for many different reasons during disasters:

  • Because of convenience
  • Based on social norms
  • Based on personal recommendations
  • For humor & levity
  • For information seeking
  • For timely information
  • For unfiltered information
  • To determine disaster magnitude
  • To check in with family & friends
  • To self-mobilize
  • To maintain a sense of community
  • To seek emotional support & healing

Conversely, the research also identified reasons why some hesitate to use social media during disasters: (1) privacy and security fears, (2) accuracy concerns, (3) access issues, and (4) knowledge deficiencies. By the latter they mean the lack of knowledge on how to use social media prior to disasters. While these hurdles present important challenges they are far from being insurmountable. Educa-tion, awareness-raising, improving technology access, etc., are all policies that can address the stated constraints. In terms of accuracy, a number of advanced computing research centers such as QCRI are developing methodologies and pro-cesses to quantify credibility on social media. Seasoned journalists have also been developing strategies to verify crowdsourced information on social media.

Perhaps the biggest challenge is privacy, security and ethics. Perhaps the new mathematical technique, “differential privacy,” may provide the necessary break-through to tackle the privacy/security challenge. Scientific American writes that differential privacy “allows for the release of data while meeting a high standard for privacy protection. A differentially private data release algorithm allows researchers to ask practically any question about a database of sensitive informa-tion and provides answers that have been ‘blurred’ so that they reveal virtually nothing about any individual’s data—not even whether the individual was in the database in the first place.”

The approach has already been used in a real-world applications: a Census Bureau project called OnTheMap, “which gives researchers access to agency data. Also, differential privacy researchers have fielded preliminary inquiries from Facebook and the federally funded iDASH center at the University of California, San Diego, whose mandate in large part is to find ways for researchers to share biomedical data without compromising privacy.” So potential solutions are al-ready on the horizon and more research is on the way. This doesn’t mean there are no challenges left. There will absolutely be more. But the point I want to drive home is that we are not completely helpless in the face of these challenges.

The Report concludes with the following questions, which are yet to be answered:

  • What, if any, unique roles do various social media play for commu-nication during disasters?
  • Are some functions that social media perform during disasters more important than others?
  • To what extent can the current body of research be generalized to the U.S. population?
  • To what extent can the research on social media use during a specific disaster type, such as hurricanes, be generalized to another disaster type, such as terrorism?

Have any thoughts on what the answers might be and why? If so, feel free to add them in the comments section below. Incidentally, some of these questions could make for strong graduate theses and doctoral dissertations. To learn more about what people actually tweet during this disasters, see these findings here.