Tag Archives: Real

Social Media as Passive Polling: Prospects for Development & Disaster Response

My Harvard/MIT colleague Todd Mostak wrote his award-winning Master’s Thesis on “Social Media as Passive Polling: Using Twitter and Online Forums to Map Islamism in Egypt.” For this research, Todd evaluated the “potential of Twitter as a source of time-stamped, geocoded public opinion data in the context of the recent popular uprisings in the Middle East.” More specifically, “he explored three ways of measuring a Twitter user’s degree of political Islamism.” Why? Because he wanted to test the long-standing debate on whether Islamism is associated with poverty.

Screen Shot 2013-02-18 at 11.17.09 AM

So Todd collected millions of geo-tagged tweets from Egypt over a six month period, which he then aggregated by census district in order to regress proxies for poverty against measures of Islamism drived from the tweets and the users’ social graphs. His findings reveal that “Islamist sentiment seems to be positively correlated with male unemployment, illiteracy, and percentage of land used in agriculture and negatively correlated with percentage of men in their youth aged 15-25. Note that female variables for unemployment and age were statistically insignificant.” As with all research, there are caveats such as the weighting scale used for the variables and questions over the reliability of census variables.

Screen Shot 2013-02-18 at 11.15.59 AM

To carry out his graduate research, Todd built a web-enabled database (MapD) powered by a Graphics Processing Units (GPU) to perform real-time querying and visualization of big datasets. He is now working with Harvard’s Center for Geographic Analysis (CGA) to put make this available via a public web interface called Tweetmap. This Big Data streaming and exploration tool presen-tly displays 119 million tweets from 12/10/2012 to 12/31/2012. He is adding 6-7 million new georeferenced tweets per day (but these are not yet publicly available on Tweetmap). According to Todd, the time delay from live tweet to display on the map is about 1 second. Thanks to this GPU-powered approach, he expects that billions of tweets could be displayed in real-time.

Screen Shot 2013-02-18 at 11.14.02 AM

As always with impressive projects, no one single person was behind the entire effort. Ben Lewis, who heads the WorldMap initiative at CGA deserves a lot of credit for making Tweetmap a reality. Indeed, Todd collaborated directly with CGA’s Ben Lewis throughout this project and benefited extensively from his expertise. Matt Bertrand (lead developer for CGA) did the WorldMap-side integration of MapD to create the TweetMap interface.

Todd and I recently spoke about integrating his outstanding work on automated live mapping to QCRI’s Twitter Dashboard for Disaster Response. Exciting times. In the meantime, Todd has kindly shared his dataset of 700+ million geotagged tweets for my team and I to analyze. The reason I’m excited about this approach is best explained with this heatmap of the recent snow-storm in the northeastern US. Todd is already using Tweetmap for live crisis mapping. While this system filters by keyword, our Dashboard will use machine learning to provide more specific streams of relevant tweets, some of which could be automatically mapped on Tweetmap. See Todd’s Flickr page for more Tweetmap visuals.

Screen Shot 2013-02-18 at 11.30.54 AM

I’m also excited by Todd’s GPU-powered approach for a project I’m exploring with UN and World Bank colleagues. The purpose of that research project is to determine whether socio-economic trends such as poverty and unemployment can be captured via Twitter. Our first case study is Egypt. Depending on the results, we may be able to take it one step further by applying sentiment analysis to real-time, georeferenced tweets to visualize Twitter users’ per-ception vis-a-vis government services—a point of interest for my UN colleagues in Cairo.


Some Thoughts on Real-Time Awareness for Tech@State

I’ve been invited to present at Tech@State in Washington DC to share some thoughts on the future of real-time awareness. So I thought I’d use my blog to brainstorm and invite feedback from iRevolution readers. The organizers of the event have shared the following questions with me as a way to guide the conver-sation: Where is all of this headed?  What will social media look like in five to ten years and what will we do with all of the data? Knowing that the data stream can only increase in size, what can we do now to prepare and prevent being over-whelmed by the sheer volume of data?

These are big, open-ended questions, and I will only have 5 minutes to share some preliminary thoughts. I shall thus focus on how time-critical crowdsourcing can yield real-time awareness and expand from there.

Two years ago, my good friend and colleague Riley Crane won DARPA’s $40,000 Red Balloon Competition. His team at MIT found the location of 10 weather balloons hidden across the continental US in under 9 hours. The US covers more than 3.7 million square miles and the balloons were barely 8 feet wide. This was truly a needle-in-the-haystack kind of challenge. So how did they do it? They used crowdsourcing and leveraged social media—Twitter in particular—by using a “recursive incentive mechanism” to recruit thousands of volunteers to the cause. This mechanism would basically reward individual participants financially based on how important their contributions were to the location of one or more balloons. The result? Real-time, networked awareness.

Around the same time that Riley and his team celebrated their victory at MIT, another novel crowdsourcing initiative was taking place just a few miles away at The Fletcher School. Hundreds of students were busy combing through social and mainstream media channels for actionable and mappable information on Haiti following the devastating earthquake that had struck Port-au-Prince. This content was then mapped on the Ushahidi-Haiti Crisis Map, providing real-time situational awareness to first responders like the US Coast Guard and US Marine Corps. At the same time, hundreds of volunteers from the Haitian Diaspora were busy translating and geo-coding tens of thousands of text messages from disaster-affected communities in Haiti who were texting in their location & most urgent needs to a dedicated SMS short code. Fletcher School students filtered and mapped the most urgent and actionable of these text messages as well.

One year after Haiti, the United Nation’s Office for the Coordination of Humanitarian Affairs (OCHA) asked the Standby Volunteer Task Force (SBTF) , a global network of 700+ volunteers, for a real-time map of crowdsourced social media information on Libya in order to improve their own situational awareness. Thus was born the Libya Crisis Map.

The result? The Head of OCHA’s Information Services Section at the time sent an email to SBTF volunteers to commend them for their novel efforts. In this email, he wrote:

“Your efforts at tackling a difficult problem have definitely reduced the information overload; sorting through the multitude of signals on the crisis is no easy task. The Task Force has given us an output that is manageable and digestible, which in turn contributes to better situational awareness and decision making.”

These three examples from the US, Haiti and Libya demonstrate what is already possible with time-critical crowdsourcing and social media. So where is all this headed? You may have noted from each of these examples that their success relied on the individual actions of hundreds and sometimes thousands of volunteers. This is primarily because automated solutions to filter and curate the data stream are not yet available (or rather accessible) to the wider public. Indeed, these solutions tend to be proprietary, expensive and/or classified. I thus expect to see free and open source solutions crop up in the near future; solutions that will radically democratize the tools needed to gain shared, real-time awareness.

But automated natural language processing (NLP) and machine learning alone are not likely to succeed, in my opinion. The data stream is actually not a stream, it is a massive torent of non-indexed information, a 24-hour global firehose of real-time, distributed multi-media data that continues to outpace our ability to produce actionable intelligence from this torrential downpour of 0’s and 1’s. To turn this data tsunami into real-time shared awareness will require that our filtering and curation platforms become more automated and collaborative. I believe the key is thus to combine automated solutions with real-time collabora-tive crowdsourcing tools—that is, platforms that enable crowds to collaboratively filter and curate real-time information, in real-time.

Right now, when we comb through Twitter, for example, we do so on our own, sitting behind our laptop, isolated from others who may be seeking to filter the exact same type of content. We need to develop free and open source platforms that allow for the distributed-but-networked, crowdsourced filtering and curation of information in order to democratize the sense-making of the firehose. Only then will the wider public be able to win the equivalent of Red Balloon competitions without needing $40,000 or a degree from MIT.

I’d love to get feedback from readers about what other compelling cases or arguments I should bring up in my presentation tomorrow. So feel free to post some suggestions in the comments section below. Thank you!