iRevolutions

Humanitarian Crisis Computing 101

Posted on October 15, 2013 | Leave a comment

Disaster-affected communities are increasingly becoming “digital” communities. That is, they increasingly use mobile technology & social media to communicate during crises. I often refer to this user-generated content as Big (Crisis) Data. Humanitarian crisis computing seeks to rapidly identify informative, actionable and credible content in this growing stack of real-time information. The challenge is akin to finding the proverbial needle in the haystack since the vast majority of reports posted on social media is often not relevant for humanitarian response. This is largely a result of the demand versus supply problem described here.

In any event, the few “needles” of information that are relevant, can relay information that is vital and indeed-life saving for relief efforts—both traditional top-down efforts and more bottom-up grassroots efforts. When disaster strikes, we increasingly see social media traffic explode. We know there are important “pins” of relevant information hidden in this growing stack of information but how do we find them in real-time?

Humanitarian organizations are ill-equipped to managing the deluge of Big Crisis Data. They tend to sift through the stack of information manually, which means they aren’t able to process more than a small volume of information. This is represented by the dotted green line in the picture below. Big Data is often described as filter failure. Our manual filters cannot manage the large volume, velocity and variety of information posted on social media during disasters. So all the information above the dotted line, Big Data, is completely ignored.

This is where Advanced Computing comes in. Advanced Computing uses Human and Machine Computing to manage Big Data and reduce filter failure, thus allowing humanitarian organizations to process a larger volume, velocity and variety of crisis information in less time. In other words, Advanced Computing helps us push the dotted green line up the information stack.

In the early days of digital humanitarian response, we used crowdsourcing to search through the haystack of user-generated content posted during disasters. Note that said content can also include text messages (SMS), like in Haiti. Crowd-sourcing crisis information is not as much fun as the picture below would suggest, however. In fact, crowdsourcing crisis information was (and can still be) quite a mess and a big pain in the haystack. Needless to say, crowdsourcing is not the best filter to make sense of Big Crisis Data.

Recently, digital humanitarians have turned to microtasking crisis information as described here and here. The UK Guardian and Wired have also written about this novel shift from crowdsourcing to microtasking.

Microtasking basically turns a haystack into little blocks of stacks. Each micro-stack is then processed by one ore more digital humanitarian volunteers. Unlike crowdsourcing, a microtasking approach to filtering crisis information is highly scalable, which is why we recently launched MicroMappers.

The smaller the micro-stack, the easier the tasks and the faster that they can be carried out by a greater number of volunteers. For example, instead of having 10 people classify 10,000 tweets based on the Cluster System, microtasking makes it very easy for 1,000 people to classify 10 tweets each. The former would take hours while the latter mere minutes. In response to the recent earthquake in Pakistan, some 100 volunteers used MicroMappers to classify 30,000+ tweets in about 30 hours, for example.

Machine Computing, in contrast, uses natural language processing (NLP) and machine learning (ML) to “quantify” the haystack of user-generated content posted on social media during disasters. This enable us to automatically identify relevant “needles” of information.

An example of a Machine Learning approach to crisis computing is the Artificial Intelligence for Disaster Response (AIDR) platform. Using AIDR, users can teach the platform to automatically identify relevant information from Twitter during disasters. For example, AIDR can be used to automatically identify individual tweets that relay urgent needs from a haystack of millions of tweets.

The pictures above are taken from the slide deck I put together for a keynote address I recently gave at the Canadian Ministry of Foreign Affairs.

Hashtag Analysis of #Westgate Crisis Tweets

Posted on October 9, 2013 | Leave a comment

In July 2013, my team and I at QCRI launched this dashboard to analyze hashtags used by Twitter users during crises. Our first case study, which is available here, focused on Hurricane Sandy. Since then, both the UN and Greenpeace have also made use of the dashboard to analyze crisis tweets.

We just uploaded 700,000+ Westgate related tweets to the dashboard. The results are available here and also displayed above. The dashboard is still under development, so we very much welcome feedback on how to improve it for future analysis. You can upload your own tweets to the dashboard if you’d like to test drive the platform.

See also: Forensics Analysis of #Westgate Tweets (Link)

Forensics Analysis of #Westgate Tweets (Updated)

Posted on October 3, 2013 | 16 comments

Update 1: Our original Twitter collection of Westgate-related tweets included the following hashtags: #Kenya, #Nairobi #WestgateAttack, #WestagateMall, #WestgatemallAttack, #Westgateshootout & #Westgate. While we overlooked #Westlands and Westlands, we have just fixed the oversight. This explains why the original results below differed from the iHub’s analysis which was based on tweets with the keywords Westgate and Westlands.

Update 2: The list below of first tweets to report the attack has been updated to include tweets referring to Westlands. These are denoted by an asterisk (*).

I’m carrying out some preliminary “information forensics” research on the 740,000+ tweets posted during the Westgate attack. More specifically, I’m looking for any clues in the hours leading up to the attack that may reveal something out of the ordinary prior to the siege. Other questions I’m hoping to answer: Were any tweets posted during the crisis actionable? Did they add situational awareness? What kind of multimedia content was shared? Which tweets were posted by eyewitnesses? Were any tweets posted by the attackers or their supporters? If so, did these carry tactical information?

If you have additional suggestions on what else to search for, please feel free to post them in the comments section below, thank you very much. I’ll be working with QCRI research assistants over the next few weeks to dive deeper into the first 24 hours of the attack as reported on Twitter. This research would not be possible where it not for my colleagues at GNIP who very kindly granted me access their platform to download all the tweets. I’ve just reviewed the first hour of tweets (which proved to be highly emotional, as expected). Below are the very first tweets posted about the attack.

[12:38:20 local time]*
gun shots in westlands? wtf??

[12:41:49]*
Weird gunshot like sounds in westlands : (

[12:42:35]
Explosions and gunfight ongoing in #nairobi

[12:42:38]
Something really bad goin on at #Westgate. Gunshots!!!! Everyone’s fled.

[12:43:17] *
Somewhere behind Westlands? What’s up RT @[username]: Explosions and gunfight ongoing in #nairobi

[12:44:03]
Are these gunshots at #Westgate? Just heard shooting from the road behind sarit, sounded like it was coming from westgate

[12:44:37]*
@[username] shoot out at westgate westlands mall. going on for the last 10 min

[12:44:38]
Heavily armed thugs have taken over #WestGate shopping mall. Al occupants and shoppers are on the floor. Few gunshots heard…more to follow

[12:44:51]*
did anyone else in westlands hear that? #KOT #Nairobi

[12:45:04]
Seems like explosions and small arms fire are coming from Westlands or Gigiri #nairobi

[12:46:12]
Gun fight #westgate… @ntvkenya @KTNKenya @citizentvkenya any news…

[12:46:44]*
Several explosions followed by 10 minutes of running gunfight in Nairobi westlands

[12:46:59]
Small arms fire is continuing to be exchanged intermittently. #nairobi

[12:46:59]
Something’s going on around #Westgate #UkayCentre area. Keep away if you can

[12:47:54]
Gunshots and explosions heard around #Westgate anybody nearby? #Westlands

[12:48:33]*
@KenyaRedCross explosions and gunshots heard near Westgate Mall in Westlands. Fierce shoot out..casualties probable

[12:48:36]
Shoot to kill order #westgate

—

See also:

We Are Kenya: Global Map of #Westgate Tweets [Link]
Did Terrorists Use Twitter to Increase Situational Awareness? [Link]
Analyzing Tweets Posted During Mumbai Terrorist Attacks [Link]
Web 2.0 Tracks Attacks on Mumbai [Link]

16 Comments

Posted in Crowdsourcing, Information Forensics, Social Media

Tagged Attack, kenya, Nairobi, Twitter, Westgate

We Are Kenya: Global Map of #Westgate Tweets

Posted on October 3, 2013 | 5 comments

I spent over an hour trying to write this first paragraph last week and still don’t know where to start. I grew up in Nairobi, my parents lived in Kenya for more than 15 years, their house was 5 minutes from Westgate, my brother’s partner is Kenyan and I previously worked for Ushahidi, a Kenyan not-for-profit group. Witnessing the tragedy online as it unfolded in real-time, graphic pictures and all, was traumatic; I did not know the fate of several friends right away. This raw anxiety brought back memories from the devastating Haiti Earthquake of 2010; it took 12 long hours until I got word that my wife and friends had just made it out of a crumbling building.

What to do with this most recent experience and the pain that lingers? Amongst the graphic Westgate horror unfolding via Twitter, I also witnessed the outpouring of love, support and care; the offers of help from Kenyans and Somalis alike; collective grieving, disbelief and deep sadness; the will to remain strong, to overcome, to be united in support of the victims, their families and friends. So I reached out to several friends in Nairobi to ask them if aggregating and surfacing these tweets publicly could serve as a positive testament. They all said yes.

I therefore contacted colleagues at GNIP who kindly let me use their platform to collect more than 740,000 tweets related to the tragedy, starting from several hours before the horror began until the end of the siege. I then reached out to friends Claudia Perlich (data scientist) and Jer Throp (data artist) for their help on this personal project. They both kindly agreed to lend their expertise. Claudia quickly put together the map above based on the location of Twitter users responding to the events in Nairobi (click map to enlarge). The graph below depicts where Twitter users covering the Westgate tragedy were tweeting from during the first 35 hours or so.

We also did some preliminary content analysis of some keywords. The graph below displays the frequency of the terms “We Are One,” “Blood Appeal / Blood Donations,” and “Pray / Prayers” during the four day siege (click to enlarge).

Jer suggested (thankfully) a more compelling and elegant data visualization approach, which we are exploring this week. So we hope to share some initial visuals in the coming days. If you have any specific suggestions on other ways to analyze and visualize the data, please do share them in the comments section below, thank you.

See also: Forensics Analysis of #Westgate Tweets [Link]

5 Comments

Posted in Crowdsourcing, Social Media

Tagged Attack, Nairobi, Twitter, Westgate

AIDR: Artificial Intelligence for Disaster Response

Posted on October 1, 2013 | 22 comments

Social media platforms are increasingly used to communicate crisis information when major disasters strike. Hence the rise of Big (Crisis) Data. Humanitarian organizations, digital humanitarians and disaster-affected communities know that some of this user-generated content can increase situational awareness. The challenge is to identify relevant and actionable content in near real-time to triangulate with other sources and make more informed decisions on the spot. Finding potentially life-saving information in this growing stack of Big Crisis Data, however, is like looking for the proverbial needle in a giant haystack. This is why my team and I at QCRI are developing AIDR.

The free and open source Artificial Intelligence for Disaster Response platform leverages machine learning to automatically identify informative content on Twitter during disasters. Unlike the vast majority of related platforms out there, we go beyond simple keyword search to filter for informative content. Why? Because recent research shows that keyword searches can miss over 50% of relevant content posted on Twitter. This is very far from optimal for emergency response. Furthermore, tweets captured via keyword search may not be relevant since words can have multiple meanings depending on context. Finally, keywords are restricted to one language only. Machine learning overcomes all these limitations, which is why we’re developing AIDR.

So how does AIDR work? There are three components of AIDR: the Collector, Trainer and Tagger. The Collector simply allows you to collect and save a collection of tweets posted during a disaster. You can download these tweets for analysis at any time and also use them to create an automated filter using machine learning, which is where the Trainer and Tagger come in. The Trainer allows one or more users to train the AIDR platform to automatically tag tweets of interest in a given collection of tweets. Tweets of interest could include those that refer to “Needs”, “Infrastructure Damage” or “Rumors” for example.

A user creates a Trainer for tweets-of-interest by: 1) Creating a name for their Trainer, e.g., “My Trainer”; 2) Identifying topics of interest such as “Needs”, “Infrastructure Damage”, “Rumors” etc. (as many topics as the user wants); and 3) Classifying tweets by topic of interest. This last step simply involves reading collected tweets and classifying them as “Needs”, “Infrastructure Damage”, “Rumor” or “Other,” for example. Any number of users can participate in classifying these tweets. That is, once a user creates a Trainer, she can classify the tweets herself, or invite her organization to help her classify, or ask the crowd to help classify the tweets, or all of the above. She simply shares a link to her training page with whoever she likes. If she choses to crowdsource the classification of tweets, AIDR includes a built-in quality control mechanism to ensure that the crowdsourced classification is accurate.

As noted here, we tested AIDR in response to the Pakistan Earthquake last week. We quickly hacked together the user interface displayed below, so functionality rather than design was our immediate priority. In any event, digital humanitarian volunteers from the Standby Volunteer Task Force (SBTF) tagged over 1,000 tweets based on the different topics (labels) listed below. As far as we know, this was the first time that a machine learning classifier was crowdsourced in the context of a humanitarian disaster. Click here for more on this early test.

The Tagger component of AIDR analyzes the human-classified tweets from the Trainer to automatically tag new tweets coming in from the Collector. This is where the machine learning kicks in. The Tagger uses the classified tweets to learn what kinds of tweets the user is interested in. When enough tweets have been classified (20 minimum), the Tagger automatically begins to tag new tweets by topic of interest. How many classified tweets is “enough”? This will vary but the more tweets a user classifies, the more accurate the Tagger will be. Note that each automatically tagged tweet includes an accuracy score—i.e., the probability that the tweet was correctly tagged by the automatic Tagger.

The Tagger thus displays a list of automatically tagged tweets updated in real-time. The user can filter this list by topic and/or accuracy score—display all tweets tagged as “Needs” with an accuracy of 90% or more, for example. She can also download the tagged tweets for further analysis. In addition, she can share the data link of her Tagger with developers so the latter can import the tagged tweets directly into to their own platforms, e.g., MicroMappers, Ushahidi, CrisisTracker, etc. (Note that AIDR already powers CrisisTracker by automating the classification of tweets). In addition, the user can share a display link with individuals who wish to embed the live feed into their websites, blogs, etc.

In sum, AIDR is an artificial intelligence engine developed to power consumer applications like MicroMappers. Any number of other tools can also be added to the AIDR platform, like the Credibility Plugin for Twitter that we’re collaborating on with partners in India. Added to AIDR, this plugin will score individual tweets based on the probability that they convey credible information. To this end, we hope AIDR will become a key node in the nascent ecosystem of next-generation humanitarian technologies. We plan to launch a beta version of AIDR at the 2013 CrisisMappers Conference (ICCM 2013) in Nairobi, Kenya this November.

In the meantime, we welcome any feedback you may have on the above. And if you want to help as an alpha tester, please get in touch so I can point you to the Collector tool, which you can start using right away. The other AIDR tools will be open to the same group of alpha tester in the coming weeks. For more on AIDR, see also this article in Wired.

The AIDR project is a joint collaboration with the United Nations Office for the Coordination of Humanitarian Affairs (OCHA). Other organizations that have expressed an interest in AIDR include the International Committee of the Red Cross (ICRC), American Red Cross (ARC), Federal Emergency Management Agency (FEMA), New York City’s Office for Emergency Management and their counterpart in the City of San Francisco.

Note: In the future, AIDR could also be adapted to take in Facebook status updates and text messages (SMS).

22 Comments

Posted in Big Data, Crowdsourcing, Humanitarian Technologies, Social Computing, Social Media

Tagged AIDR, Disaster, Response, Twitter

Developing MicroFilters for Digital Humanitarian Response

Posted on September 30, 2013 | 2 comments

Filtering—or the lack thereof—presented the single biggest challenge when we tested MicroMappers last week in response to the Pakistan Earthquake. As my colleague Clay Shirky notes, the challenge with “Big Data” is not information overload but rather filter failure. We need to make damned sure that we don’t experience filter failure again in future deployments. To ensure this, I’ve decided to launch a stand-alone and fully interoperable platform called MicroFilters. My colleague Andrew Ilyas will lead the technical development of the platform with support from Ji Lucas. Our plan is to launch the first version of MicroFilters before the CrisisMappers conference (ICCM 2013) in November.

A web-based solution, MicroFilters will allow users to upload their own Twitter data for automatic filtering purposes. Users will have the option of uploading this data using three different formats: text, CSV and JSON. Once uploaded, users can elect to perform one or more automatic filtering tasks from this menu of options:

Note that “unique image and video links” refer to the long URLs not shortened URLs like bit.ly. After selecting the desired filtering option(s), the user simply clicks on the “Filter” button. Once the filtering is completed (a countdown clock is displayed to inform the user of the expected processing time), MicroFilters provides the user with a download link for the filtered results. The link remains live for 10 minutes after which the data is automatically deleted. If a CSV file was uploaded for filtering, the file format for download is also in CSV format; likewise for text and JSON files. Note that filtered tweets will appear in reverse chronological order (assuming time-stamp data was included in the uploaded file) when downloaded. The resulting file of filtered tweets can then be uploaded to MicroMappers within seconds.

In sum, MicroFilters will be invaluable for future deployments of MicroMappers. Solving the “filter failure” problem will enable digital humanitarians to process far more relevant data and in a more timely manner. Since MicroFilters will be a standalone platform, anyone else will also have access to these free and automatic filtering services. In the meantime, however, we very much welcome feedback, suggestions and offers of help, thank you!

2 Comments

Posted in Big Data, Crowdsourcing, Humanitarian Technologies, Social Computing, Social Media

Tagged Disaster, MicroFilters, MicroMappers, Response, Twitter

Results of MicroMappers Response to Pakistan Earthquake (Updated)

Posted on September 27, 2013 | 21 comments

Update: We’re developing & launching MicroFilters to improve MicroMappers.

About 47 hours ago, the UN Office for the Coordination of Humanitarian Affairs (OCHA) activated the Digital Humanitarian Network (DHN) in response to the Pakistan Earthquake. The activation request was for 48 hours, so the deployment will soon phase out. As already described here, the Standby Volunteer Task Force (SBTF) teamed up with QCRI to carry out an early test of MicroMappers, which was not set to launch until next month. This post shares some initial thoughts on how the test went along with preliminary results.

During ~40 hours, 109 volunteers from the SBTF and the public tagged just over 30,000 tweets that were posted during the first 36 hours or so after the quake. We were able to automatically collect these tweets thanks to our partnership with GNIP and specifically filtered for said tweets using half-a-dozen hashtags. Given the large volume of tweets collected, we did not require that each tweet be tagged at least 3 times by individual volunteers to ensure data quality control. Out of these 30,000+ tweets, volunteers tagged a total of 177 tweets as noting needs or infrastructure damage. A review of these tweets by the SBTF concluded that none were actually informative or actionable.

Just over 350 pictures were tweeted in the aftermath of the earthquake. These were uploaded to the ImageClicker for tagging purposes. However, none of the pictures captured evidence of infrastructure damage. In fact, the vast majority were unrelated to the earthquake. This was also true of pictures published in news articles. Indeed, we used an automated algorithm to identify all tweets with links to news articles; this algorithm would then crawl these articles for evidence of images. We found that the vast majority of these automatically extracted pictures were related to politics rather than infrastructure damage.

A few preliminary thoughts and reflections from this first test of MicroMappers. First, however, a big, huge, gigantic thanks to my awesome QCRI team: Ji Lucas, Imran Muhammad and Kiran Garimella; to my outstanding colleagues on the SBTF Core Team including but certainly not limited to Jus Mackinnon, Melissa Elliott, Anahi A. Iaccuci, Per Aarvik & Brendan O’Hanrahan (bios here); to the amazing SBTF volunteers and members of the general public who rallied to tag tweets and images—in particular our top 5 taggers: Christina KR, Leah H, Lubna A, Deborah B and Joyce M! Also bravo to volunteers in the Netherlands, UK, US and Germany for being the most active MicroMappers; and last but certainly not least, big, huge and gigantic thanks to Andrew Ilyas for developing the algorithms to automatically identify pictures and videos posted to Twitter.

So what did we learn over the past 48 hours? First, the disaster-affected region is a remote area of south-western Pakistan with a very light social media footprint, so there was practically no user-generated content directly relevant to needs and damage posted on Twitter during the first 36 hours. In other words, there were no needles to be found in the haystack of information. This is in stark contrast to our experience when we carried out a very similar operation following Typhoon Pablo in the Philippines. Obviously, if there’s little to no social media footprint in a disaster-affected area, then monitoring social media is of no use at all to anyone. Note, however, that MicroMappers could also be used to tag 30,000+ text messages (SMS). (Incidentally, since the earthquake struck around 12noon local time, there was only about 18 hours of daylight during the 36-hour period for which we collected the tweets).

Second, while the point of this exercise was not to test our pre-processing filters, it was clear that the single biggest problem was ultimately with the filtering. Our goal was to upload as many tweets as possible to the Clickers and stress-test the apps. So we only filtered tweets using a number of general hashtags such as #Pakistan. Furthermore, we did not filter out any retweets, which probably accounted for 2/3 of the data, nor did we filter by geography to ensure that we were only collecting and thus tagging tweets from users based in Pakistan. This was a major mistake on our end. We were so pre-occupied with testing the actual Clickers that we simply did not pay attention to the pre-processing of tweets. This was equally true of the images uploaded to the ImageClicker.

So where do we go from here? Well we have pages and pages worth of feedback to go through and integrate in the next version of the Clickers. For me, one of the top priorities is to optimize our pre-processing algorithms and ensure that the resulting output can be automatically uploaded to the Clickers. We have to refine our algorithms and make damned sure that we only upload unique tweets and images to our Clickers. At most, volunteers should not see the same tweet or image more than 3 times for verification purposes. We should also be more careful with our hashtag filtering and also consider filtering by geography. Incidentally, when our free & open source AIDR platform becomes operational in November, we’ll also have the ability to automatically identify tweets referring to needs, reports of damage, and much, much more.

In fact, AIDR was also tested for the very first time. SBTF volunteers tagged about 1,000 tweets, and just over 130 of the tags enabled us to create an accurate classifier that can automatically identify whether a tweet is relevant for disaster response efforts specifically in Pakistan (80% accuracy). Now, we didn’t apply this classifier on incoming tweets because AIDR uses streaming Twitter data, not static, archived data which is what we had (in the form of CSV files). In any event, we also made an effort to create classifiers for needs and infrastructure damage but did not get enough tags to make these accurate enough. Typically, we need a minimum of 20 or so tags (i.e., examples of actual tweets referring to needs or damage). The more tags, the more accurate the classifier.

The reason there were so few tags, however, is because there were very few to no informative tweets referring to needs or infrastructure damage during the first 36 hours. In any event, I believe this was the very first time that a machine learning classifier was crowdsourced for disaster response purposes. In the future, we may want to first crowdsource a machine learning classifier for disaster relevant tweets and then upload the results to MicroMappers; this would reduce the number of unrelated tweets displayed on a TweetClicker.

As expected, we have also received a lot of feedback vis-a-vis user experience and the user interface of the Clickers. Speed is at the top of the list. That is, making sure that once I’ve clicked on a tweet/image, the next tweet/image automatically appears. At times, I had to wait more than 20 seconds for the next item to load. We also need to add more progress bars such as the number of tweets or images that remain to be tagged—a countdown display, basically. I could go on and on, frankly, but hopefully these early reflections are informative and useful to others developing next-generation humanitarian technologies. In sum, there is a lot of work to be done still. Onwards!

21 Comments

Posted in Crowdsourcing, Humanitarian Technologies, Social Computing, Social Media

Tagged Disaster, earthquake, MicroMappers, Pakistan, Response, Twitter

MicroMappers Launched for Pakistan Earthquake Response (Updated)

Posted on September 25, 2013 | 16 comments

Update 1: MicroMappers is now public! Anyone can join to help the efforts!
Update 2: Results of MicroMappers Response to Pakistan Earthquake [Link]

MicroMappers was not due to launch until next month but my team and I at QCRI received a time-sensitive request by colleagues at the UN to carry out an early test of the platform given yesterday’s 7.7 magnitude earthquake, which killed well over 300 and injured hundreds more in south-western Pakistan.

Shortly after this request, the UN Office for the Coordination of Humanitarian Affairs (OCHA) in Pakistan officially activated the Digital Humanitarian Network (DHN) to rapidly assess the damage and needs resulting from the earthquake. The award-winning Standby Volunteer Task Force (SBTF), a founding member of the DHN. teamed up with QCRI to use MicroMappers in response to the request by OCHA-Pakistan. This exercise, however, is purely for testing purposes only. We made this clear to our UN partners since the results may be far from optimal.

MicroMappers is simply a collection of microtasking apps (we call them Clickers) that we have customized for disaster response purposes. We just launched both the Tweet and Image Clickers to support the earthquake relief and may also launch the Tweet and Image GeoClickers as well in the next 24 hours. The TweetClicker is pictured below (click to enlarge).

Thanks to our partnership with GNIP, QCRI automatically collected over 35,000 tweets related to Pakistan and the Earthquake (we’re continuing to collect more in real-time). We’ve uploaded these tweets to the TweetClicker and are also filtering links to images for upload to the ImageClicker. Depending on how the initial testing goes, we may be able to invite help from the global digital village. Indeed, “crowdsourcing” is simply another way of saying “It takes a village…” In fact, that’s precisely why MicroMappers was developed, to enable anyone with an Internet connection to become a digital humanitarian volunteer. The Clicker for images is displayed below (click to enlarge).

Now, whether this very first test of the Clickers goes well remains to be seen. As mentioned, we weren’t planning to launch until next month. But we’ve already learned heaps from the past few hours alone. For example, while the Clickers are indeed ready and operational, our automatic pre-processing filters are not yet optimized for rapid response. The purpose of these filters is to automatically identify tweets that link to images and videos so that they can be uploaded to the Clickers directly. In addition, while our ImageClicker is operational, our VideoClicker is still under development—as is our TranslateClicker, both of which would have been useful in this response. I’m sure will encounter other issues over the next 24-36 hours. We’re keeping track of these in a shared Google Spreadsheet so we can review them next week and make sure to integrate as much of the feedback as possible before the next disaster strikes.

Incidentally, we (QCRI) also teamed up with the SBTF to test the very first version of the Artificial Intelligence for Disaster Response (AIDR) platform for about six hours. As far as we know, this test represents the first time that machine learning classifiers for disaster resposne were created on the fly using crowdsourcing. We expect to launch AIDR publicly at the 2013 CrisisMappers conference this November (ICCM 2013). We’ll be sure to share what worked and didn’t work during this first AIDR pilot test. So stay tuned for future updates via iRevolution. In the meantime, a big, big thanks to the SBTF Team for rallying so quickly and for agreeing to test the platforms! If you’re interested in becoming a digital humanitarian volunteer, simply join us here.

16 Comments

Posted in Crowdsourcing, Humanitarian Technologies, Social Computing, Social Media

Tagged AIDR, DHN, Disaster, earthquake, MicroMappers, OCHA, Pakistan, QCRI, Response, SBTF, UN

Seven Principles for Big Data and Resilience Projects

Posted on September 23, 2013 | 8 comments

Authored by Kate Crawford, Patrick Meier, Claudia Perlich, Amy Luers, Gustavo Faleiros and Jer Thorp, 2013 PopTech & Rockefeller Foundation Bellagio Fellows.

Update: See also “Big Data, Communities and Ethical Resilience: A Framework for Action” written by the above Fellows and available here (PDF).

The following is a draft “Code of Conduct” that seeks to provide guidance on best practices for resilience building projects that leverage Big Data and Advanced Computing. These seven core principles serve to guide data projects to ensure they are socially just, encourage local wealth- & skill-creation, require informed consent, and be maintainable over long timeframes. This document is a work in progress, so we very much welcome feedback. Our aim is not to enforce these principles on others but rather to hold ourselves accountable and in the process encourage others to do the same. Initial versions of this draft were written during the 2013 PopTech & Rockefeller Foundation workshop in Bellagio, August 2013.

1. Open Source Data Tools

Wherever possible, data analytics and manipulation tools should be open source, architecture independent and broadly prevalent (R, python, etc.). Open source, hackable tools are generative, and building generative capacity is an important element of resilience. Data tools that are closed prevent end-users from customizing and localizing them freely. This creates dependency on external experts which is a major point of vulnerability. Open source tools generate a large user base and typically have a wider open knowledge base. Open source solutions are also more affordable and by definition more transparent. Open Data Tools should be highly accessible and intuitive to use by non-technical users and those with limited technology access in order to maximize the number of participants who can independently use and analyze Big Data.

2. Transparent Data Infrastructure

Infrastructure for data collection and storage should operate based on transparent standards to maximize the number of users that can interact with the infrastructure. Data infrastructure should strive for built-in documentation, be extensive and provide easy access. Data is only as useful to the data scientist as her/his understanding of its collection is correct. This is critical for projects to be maintained over time, regardless of team membership, otherwise projects will collapse when key members leave. To allow for continuity, the infrastructure has to be transparent and clear to a broad set of analysts – independent of the tools they bring to bear. Solutions such as hadoop, JSON formats and the use of clouds are potentially suitable.

3. Develop and Maintain Local Skills

Make “Data Literacy” more widespread. Leverage local data labor and build on existing skills. The key and most constraint ingredient to effective data solutions remains human skill/knowledge and needs to be retained locally. In doing so, consider cultural issues and language. Catalyze the next generation of data scientists and generate new required skills in the cities where the data is being collected. Provide members of local communities with hands-on experience; people who can draw on local understanding and socio-cultural context. Longevity of Big Data for Resilience projects depends on the continuity of local data science teams that maintain an active knowledge and skills base that can be passed on to other local groups. This means hiring local researchers and data scientists and getting them to build teams of the best established talent, as well as up-and-coming developers and designers. Risks emerge when non-resident companies are asked to spearhead data programs that are connected to local communities. They bring in their own employees, do not foster local talent over the long-term, and extract value from the data and the learning algorithms that are kept by the company rather than the local community.

4. Local Data Ownership

Use Creative Commons and licenses that state that data is not to be used for commercial purposes. The community directly owns the data it generates, along with the learning algorithms (machine learning classifiers) and derivatives. Strong data protection protocols need to be in place to protect identities and personally identifying information. Only the “Principle of Do No Harm” can trump consent, as explicitly stated by the International Committee of the Red Cross’s Data Protection Protocols (ICRC 2013). While the ICRC’s data protection standards are geared towards humanitarian professionals, their core protocols are equally applicable to the use of Big Data in resilience projects. Time limits on how long the data can be used for should be transparently stated. Shorter frameworks should always be preferred, unless there are compelling reasons to do otherwise. People can give consent for how their data might be used in the short to medium term, but after that, the possibilities for data analytics, predictive modelling and de-anonymization will have advanced to a state that cannot at this stage be predicted, let alone consented to.

5. Ethical Data Sharing

Adopt existing data sharing protocols like the ICRC’s (2013). Permission for sharing is essential. How the data will be used should be clearly articulated. An opt in approach should be the preference wherever possible, and the ability for individuals to remove themselves from a data set after it has been collected must always be an option. Projects should always explicitly state which third parties will get access to data, if any, so that it is clear who will be able to access and use the data. Sharing with NGOs, academics and humanitarian agencies should be carefully negotiated, and only shared with for-profit companies when there are clear and urgent reasons to do so. In that case, clear data protection policies must be in place that will bind those third parties in the same way as the initial data gatherers. Transparency here is key: communities should be able to see where their data goes, and a complete list of who has access to it and why.

6. Right Not To Be Sensed

Local communities have a right not to be sensed. Large scale city sensing projects must have a clear framework for how people are able to be involved or choose not to participate. All too often, sensing projects are established without any ethical framework or any commitment to informed consent. It is essential that the collection of any sensitive data, from social and mobile data to video and photographic records of houses, streets and individuals, is done with full public knowledge, community discussion, and the ability to opt out. One proposal is the #NoShare tag. In essence, this principle seeks to place “Data Philanthropy” in the hands of local communities and in particular individuals. Creating clear informed consent mechanisms is a requisite for data philanthropy.

7. Learning from Mistakes

Big Data and Resilience projects need to be open to face, report, and discuss failures. Big Data technology is still very much in a learning phase. Failure and the learning and insights resulting from it should be accepted and appreciated. Without admitting what does not work we are not learning effectively as a community. Quality control and assessment for data-driven solutions is notably harder than comparable efforts in other technology fields. The uncertainty about quality of the solution is created by the uncertainty inherent in data. Even good data scientist are struggling to assess the upside potential of incremental efforts on the quality of a solution. The correct analogy is more one a craft rather a science. Similar to traditional crafts, the most effective way is to excellence is to learn from ones mistakes under the guidance of a mentor with a collective knowledge of experiences of both failure and success.

8 Comments

Posted in Big Data, Disaster Resilience

Tagged Bellagio, Fellows, Pop, Resilience, Rockefeller

MicroMappers: Microtasking for Disaster Response

Posted on September 18, 2013 | 40 comments

My team and I at QCRI are about to launch MicroMappers: the first ever set of microtasking apps specifically customized for digital humanitarian response. If you’re new to microtasking in the context of disaster response, then I recommend reading this, this and this. The purpose of our web-based microtasking apps (we call them Clickers) is to quickly make sense of all the user-generated, multi-media content posted on social media during disasters. How? By using microtasking and making it as easy as a single click of the mouse to become a digital humanitarian volunteer. This is how volunteers with Zooniverse were able to click-and-thus-tag well over 2,000,000 images in under 48-hours.

We have already developed and customized four Clickers using the free and open source microtasking platform CrowdCrafting: TweetClicker, TweetGeoClicker, ImageClicker and ImageGeoClicker. Each Clicker includes a mini-tutorial to guide volunteers. While we’re planning to launch them live next month, these Clickers (described below) can be used right now if need be. When a disaster strikes, we can automatically upload tweets to the TweetClicker, for example. These tweets are pre-filtered for keywords and hashtags relevant to the disaster in question. We can also automatically identify multimedia content posted to Twitter and upload this to the ImageClicker to tag pictures that show damage, for example.

TweetClicker: This Clicker invites volunteers to tag tweets based on categories specified by an organization like the UN, Red Cross or FEMA. These categories could include those from the UN Cluster System or other information needs.

ImageClicker: The purpose of this Clicker is to tag pictures posted on social media (and elsewhere) based on categories specified by a humanitarian organization. Images are auto-matically scraped from Twitter and uploaded to this Clicker.

TweetGeoClicker: This Clicker invites volunteers to geo-tag tweets (that are not already automatically geo-tagged). The Clicker’s mini-tutorial provides a list of tips on how to find the GPS locations of places mentioned in a given tweet.

ImageGeoClicker: The purpose of this Clicker is to geo-tag images posted on social media (that are not geo-tagged auto-matically). Like the Clicker above, a mini-tutorial provides a list of tips on how to find out where a given image was taken.

We are working on developing several more Clickers such as a VideoClicker and welcome suggestions for other Clickers. We are also developing “Connectors” for the Clickers we’ve developed. That is, once a tweet has been tagged using the TweetClicker, it can be automatically pushed to the TweetGeoClicker for geo-tagging purposes. Note that all our Clickers include built-in quality control mechanisms. For example, only if 3 volunteers tag an image as showing disaster damage does that image get tagged as such. This voting system ensures a high level of data quality.

Finally, we’re developing a Crisis Mapping feature to display geo-tagged tweets and images that are processed using our Clickers. For example, as soon as an image is geo-tagged by 3 volunteers using the ImageGeoClicker, it automatically appears on the live Crisis Map in real-time. Other data visualization features are also possible, such as an interface that provides trends and statistical analysis of the microtasked data. This too is on our to-do list. We’re also looking into making these Clickers useable on smartphones and tablets.

While MicroMappers is a joint collaboration with the United Nations Office for the Coordination of Humanitarian Affairs (OCHA), the Clickers are open for use by other humanitarian organizations such as FEMA and the Red Cross. Simply get in touch with me if you’d like to use these Clickers. In the meantime, many thanks to the Standby Volunteer Task Force (SBTF), a founding member of the Digital Humanitarian Network (DHN), for their invaluable feedback on earlier versions of the Clickers.

Want to be a MicroMapper? Simply join us here! We’ll provide you with updates and let you know when humanitarian organizations need your support to make sense of social media reports following a disaster.

iRevolutions

Seven Principles for Big Data and Resilience Projects

Patrick Meier, PhD

Table of Contents

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Patrick Meier, PhD

Table of Contents