Analyzing Foursquare Check-Ins During Hurricane Sandy

In this new study “Extracting Diurnal Patterns of Real World Activity from Social Media” (PDF), authors Nir Grinberg, Mor Naaman, Blake Shaw and Gild Lotan analyze Fousquare check-in’s and tweets to capture real-world activities related to coffee, food, nightlife and shopping. Here’s what an average week looks like on Foursquare, for example (click to enlarge):

Foursquare Week

“When rare events at the scale of Hurricane Sandy happen, we expect them to leave an unquestionable mark on Social Media activity.” So the authors applied the same methods used to produce the above graph to visualize and understand changes in behavior during Hurricane Sandy as reflected on Foursquare and Twitter. The results are displayed below (click to enlarge).

Sandy Analysis

“Prior to the storm, activity is relatively normal with the exception of iMac release on 10/25. The big spikes in divergent activity in the two days right before the storm correspond with emergency preparations and the spike in nightlife activity follows the ‘celebrations’ pattern afterwards. In the category of Grocery shopping (top panel) the deviations on Foursqaure and Twitter overlap closely, while on Nightlife the Twitter activity lags after Foursquare. On October 29 and 30 shops were mostly closed in NYC and we observe fewer checkins than usual, but interestingly more tweets about shopping. This finding suggests that opposing patterns of deviations may indicate of severe distress or abnormality, with the two platforms corroborating an alert.”

In sum, “the deviations in the case study of Hurricane Sandy clearly separate normal and abnormal times. In some cases the deviations on both platforms closely overlap, while in others some time lag (or even opposite trend) is evident. Moreover, during the height of the storm Foursquare activity diminishes significantly, while Twitter activity is on the rise. These findings have immediate implications for event detection systems, both in combining multiple sources of information and in using them to improving overall accuracy.”

Now if only this applied research could be transfered to operational use via a real-time dashboard, then this could actually make a difference for emergency responders and humanitarian organizations. See my recent post on the cognitive mismatch between computing research and social good needs.

bio

Using Twitter to Detect Micro-Crises in Real-Time

Social media is increasingly used to communicate during major crises. But what about small-scale incidents such as a car crash or fire? These “micro-crises” typically generate a far smaller volume of social media activity during a much shorter period and more bounded geographical area. Detecting these small-scale events thus poses an important challenge for the field of Crisis Computing.

Axel Shulz et al

Axel Schulz just published co-authored a paper on this exact challenge. In this study, he and co-authors Petar Ristoski & Heiko Paulheim “present a solution for a real-time identifi cation of small scale incidents using microblogs,” which uses machine learning—combining text classi cation and semantic enrichment of microblogs—to increase situational awareness. The study draws on 7.5 million tweets posted in the city centers of Seattle and Memphis during November & December 2012 and February 2013. The authors used the “Seattle Real Time Fire 911 Calls” dataset to identify relevant keywords in the collected tweets. They also used WordNet to “extend this set by adding the direct hyponyms. For instance, the keyword “accident” was extended with ‘collision’, ‘crash’, ‘wreck’, ‘injury’, ‘fatal accident’, and ‘casualty’.”

An evaluation of this combined “text classi cation” and “semantic enrichment” approach shows that small scale incidents can be identified with an accuracy 89%. A copy of Axel et al.‘s paper is available here (PDF). This is a remarkable level of accuracy given the rare and micro-level nature of the incidents studied.

bio

Data Science for Social Good: Not Cognitive Surplus but Cognitive Mismatch

I’ve spent the past 12 months working with top notch data scientists at QCRI et al. The following may thus be biased: I think QCRI got it right. They strive to balance their commitment to positive social change with their primary mission of becoming a world class institute for advanced computing research. The two are not mutually exclusive. What it takes is a dedicated position, like the one created for me at QCRI. It is high time that other research institutes, academic programs and international computing conferences create comparable focal points to catalyze data science for social good.

Microsoft Research, to name just one company, carries out very interesting research that could have tremendous social impact, but the bridge necessary to transfer much of that research from knowledge to operation to social impact is often not there. And when it is, it is usually by happenstance. So researchers continue to formulate research questions based on what they find interesting rather than identifying equally interesting questions that could have direct social impact if answered by data science. Hundreds of papers get presented at computing conferences every month, and yet few if any of the authors have linked up with organizations like the United Nations, World Bank, Habitat for Humanity etc., to identify and answer questions with social good potential. The same is true for hundreds of computing dissertations that get defended every year. Doctoral students do not realize that a minor reformulation of their research question could perhaps make a world of difference to a community-based organization in India dedicated to fighting corruption, for example.

Cognitive Mismatch

The challenge here is not one of untapped cognitive surplus (to borrow from Clay Shirky), but rather complete cognitive mismatch. As my QCRI colleague Ihab Ilyas puts it: there are “problem owners” on the one hand and “problem solvers” on the other. The former have problems that prevent them from catalyzing positive social change. The later know how to solve comparable problems and do so every day. But the two are not talking or even aware of each other. Creating and maintaining this two-way conversation requires more than one dedicated position (like mine at QCRI).

sweet spot

In short, I really want to have dedicated counterparts at Microsoft Research, IBM, SAP, LinkedIn, Bitly, GNIP, etc., as well as leading universities, top notch computing conferences and challenges; counterparts who have one foot in the world of data science and the other in the social sector; individuals who have a demonstrated track-record in bridging communities. There’s a community here waiting to be connected and needing to be formed. Again, carrying out cutting edge computing R&D is in no way incompatible with generating positive social impact. Moreover, the latter provides an important return on investment in the form of data, reputation, publicity, connections and social capital. In sum, social good challenges need to be formulated into research questions that have scientific as well as social good value. There is definitely a sweet spot here but it takes a dedicated community to bring problem owners and solvers together and hit that social good sweet spot.

Bio

Using Big Data to Inform Poverty Reduction Strategies

My colleagues and I at QCRI are spearheading a new experimental Research and Development (R&D) project with the United Nations Development Program (UNDP) team in Cairo, Egypt. Colleagues at Harvard University, MIT and UC Berkeley have also joined the R&D efforts as full-fledged partners. The research question: can an analysis of Twitter traffic in Egypt tell us anything about changes in unemployment and poverty levels? This question was formulated with UNDP’s Cairo-based Team during several conversations I had with them in early 2013.

Egyptian Tweets

As is well known, a major challenge in the development space is the lack of access to timely socio-economic data. So the question here is whether alternative, non-traditional sources of information (such as social media) can provide a timely and “good enough” indication of changing trends. Thanks to our academic partners, we have access to hundreds of millions of Egyptian tweets (both historical and current) along with census and demographic data for ground-truth purposes. If the research yields robust results, then our UNDP colleagues could draw on more real-time data to complement their existing datasets, which may better inform some of their local poverty reduction and development strategies. This more rapid feedback loop could lead to faster economic empowerment for local communities in Egypt. Of course, there are many challenges to working with social data vis-a-vis representation and sample bias. But that is precisely why this kind of experimental research is important—to determine whether any of our results are robust to biases in phone ownership, twitter-use, etc.

bio

Using Crowdring for Disaster Response?

35 million missed calls.

That’s the number of calls that 75-year old social justice leader Anna Hazare received from people across India who supported his efforts to fight corruption. Two weeks earlier, he had invited India to join his movement by making “missed calls” to a local number. Missed calls, known as beeping or flashing, are calls that are intentionally dropped after ringing. The advantage of making missed call is that neither the caller or recipient is charged. This tactic is particularly common in emerging economies to avoid paying for air time or SMS. To build on this pioneering work, Anna and his team are developing a mobile petition tool called Crowdring, which turns a free “missed call” into a signature on a petition.

crowdring_pic

Communicating with disaster-affected communities is key for effective disaster response. Crowdring could be used to poll disaster affected communities. The service could also be used in combination with local community radio stations. The latter would broadcast a series of yes or no questions; ringing once would signify yes, twice would mean no. Some questions that come to mind:

  1. Do you have enough drinking water? 
  2. Are humanitarian organizations doing a good job?
  3. Is someone in your household displaying symptoms of cholera?

By receiving these calls, humanitarians would automatically be able to create a database of phone numbers with associated poll results. This means they could text them right back for more information or to arrange an in person meeting. You can learn more about Crowdring in this short video below.

bio

How ReCAPTCHA Can Be Used for Disaster Response

We’ve all seen prompts like this:

recaptcha_pic

More than 100 million of these ReCAPTCHAs get filled out every day on sites like Facebook, Twitter and CNN. Google uses them to simultaneously filter out spam and digitize Google Books and archives of the New York Times. For example:

recaptcha_pic2

So what’s the connection to disaster response? In early 2010, I blogged about using massive multiplayer games to tag crisis information and asked: What is the game equivalent of reCAPTCHA for tagging crisis information? (Big thanks to friend and colleague Albert Lin for reminding me of this recently). Well, the game equivalent is perhaps the Internet Response League (IRL). But what if we simply used ReCPATCHA itself for disaster response?

Humanitarian organizations like the American Red Cross regularly monitor Twitter for disaster-related information. But they are often overwhelmed with millions of tweets during major events. While my team and I at QCRI are developing automated solutions to manage this Big (Crisis) Data, we could also  use the ReCAPTCHA methodology. For example, our automated classifiers can tell us with a certain level of accuracy whether a tweet is disaster-related, whether it refers to infrastructure damage, urgent needs, etc. If the classifier is not sure—say the tweet is scored as having a 50% chance of being related to infrastructure damage—then we could automatically post it to our version of ReCAPCHA (see below). Perhaps a list of 3 tweets could be posted with the user prompted to tag which one of the 3 is damage-related. (The other two tweets could come from a separate database of random tweets).

ReCaptcha_pic3

There are reportedly 44,000 United Nations employees around the globe. World Vision also employs over 40,000, the International Committee of the Red Cross (ICRC) has more than 12,000 employees while Oxfam has about 7,000. That’s 100,000 people right there who probably log onto their work emails at least once a day. Why not insert a ReCaptcha when they log in? We could also add  ReCAPTCHAs to these organizations’ Intranets & portals like Virtual OSOCC. On a related note, Google recently added images from Google Street View to ReCAPTCHAS. So we could automatically collect images shared on social media during disasters and post them to our own disaster response ReCAPTCHAs:

Image ReCAPTCHA

In sum, as humanitarians log into their emails multiple times a day, they’d be asked to tag which tweets and/or pictures relate to on ongoing disaster. Last year, we tagged tweets and images in support of the UN’s disaster response efforts in the Philippines following Typhoon Pablo. Adding a customized ReCAPTCHA for disaster response would help us tap a much wider audience of “volunteers”, which would mean an even more rapid turn around time for damage assessments following major disasters.

Bio

Using Waze, Uber, AirBnB and SeeClickFix for Disaster Response

After the Category 5 Tornado in Oklahoma, map editors at Waze used the service to route drivers around the damage. While Uber increased their car service fares during Hurricane Sandy, they could have modified their App to encourage the shared use of Uber cars to fill unused seats. This would have taken some work, but AirBnB did modify their platform overnight to let over 1,400 kindhearted New Yorkers offer free housing to victims of the hurricane. SeeClick fix was used also to report over 800 issues in just 24 hours after Sandy made landfall. These included reports on the precise location of power outages, flooding, downed trees, downed electric lines, and other storm damage. Following the Boston Marathon Bombing, SeeClick fix was used to quickly find emergency housing for those affected by the tragedy.

Disaster-affected populations have always been the real first responders. Paid emergency response professionals cannot be everywhere at the same time, but the crowd is always there. Disasters are collective experiences; and today, disaster-affected crowds are increasingly “digital crowds” as well—that is, both a source and consumer of that digital information. In other words, they are also the first digital responders. Thanks to connection technologies like Waze, Uber, AirBnB and SeeClickFix, disaster affected communities can self-organize more quickly than ever before since these new technologies drastically reduce the cost and time necessary to self-organize. And because resilience is a function of a community’s ability to self-organize, these new technologies can also render disaster-prone populations more resilient by fostering social capital, thus enabling them to bounce back more quickly after a crisis.

When we’re affected by disasters, we tend to use the tools that we are most familiar with, i.e. those we use on a daily basis when there is no disaster. That’s why we often see so many Facebook updates, Instagram pictures, tweets, YouTube videos, etc., posted during a disaster. The same holds true for services like Waze and AirBnB, for example. So I’m thrilled to see more examples of these platforms used as humanitarian technologies and equally heartened to know that the companies behind these tools are starting to play a more active role during disasters, thus helping people help themselves. Each of these platforms have the potential to become hyper-local match.com’s for disaster response. Facilitating this kind of mutual-aid not only builds social capital, which is critical to resilience, it also shifts the burden and pressure off the shoulders of paid responders who are often overwhelmed during major disasters.

In sum, these useful everyday technologies also serve to crowdsource and democratize disaster response. Do you know of other examples? Other everyday smartphone apps and web-based apps that get used for disaster response? If so, I’d love to know. Feel free to post your examples in the comments section below. Thanks!

bio

Big Data for Disaster Response: A List of Wrong Assumptions

Screen Shot 2013-06-09 at 1.24.56 PM

Derrick Herris puts it best:

“It might be provocative to call into question one of the hottest tech movements in generations, but it’s not really fair. That’s because how companies and people benefit from Big Data, Data Science or whatever else they choose to call the movement toward a data-centric world is directly related to what they expect going in. Arguing that big data isn’t all it’s cracked up to be is a strawman, pure and simple—because no one should think it’s magic to begin with.”

So here is a list of misplaced assumptions about the relevance of Big Data for disaster response and emergency management:

•  “Big Data will improve decision-making for disaster response”

This recent groundbreaking study by the UN confirms that many decisions made by humanitarian professionals during disasters are not based on any kind of empirical data—regardless of how large or small a dataset may be and even when the data is fully trustworthy. In fact, humanitarians often use anecdotal information or mainstream news to inform their decision-making. So no, Big Data will not magically fix these decision-making deficiencies in humanitarian organizations, all of which pre-date the era of Big (Crisis) Data.

•  Big Data suffers from extreme sample bias.”

This is often true of any dataset collected using non-random sampling methods. The statement also seems to suggest that representative sampling methods can actually be carried out just as easily, quickly and cheaply. This is very rarely the case, hence the use of non-random sampling. In other words, sample bias is not some strange disease that only affects Big Data or social media. And even though Big Data is biased and not necessarily objective, Big Data such as social media still represents a “new, large, and arguably unfiltered insights into attitudes and behaviors that were previously difficult to track in the wild.”

digital prints

Statistical correlations in Big Data do not imply causation; they simply suggest that there may be something worth exploring further. Moreover, data that is collected via non-random, non-representative sampling does not invalidate or devalue the data collected. Much of the data used for medical research, digital disease detection and police work is the product of convenience sampling. Should they dismiss or ignore the resulting data because it is not representative? Of course not.

While the 911 system was set up in 1968, the service and number were not widely known until the 1970s and some municipalities did not have the crowdsourcing service until the 1980s. So it was hardly a representative way to collect emergency calls. Does this mean that the millions of 911 calls made before the more widespread adoption of the service in the 1990s were all invalid or useless? Of course not, even despite the tens of millions of false 911 calls and hoaxes that are made ever year. Point is, there has never been a moment in history in which everyone has had access to the same communication technology at the same time. This is unlikely to change for a while even though mobile phones are by far the most rapidly distributed and widespread communication technology in the history of our species.

There were over 20 million tweets posted during Hurricane Sandy last year. While “only” 16% of Americans are on Twitter and while this demographic is younger, more urban and affluent than the norm, as Kate Crawford rightly notes, this does not render the informative and actionable tweets shared during the Hurricane useless to emergency managers. After Typhoon Pablo devastated the Philippines last year, the UN used images and videos shared on social media as a preliminary way to assess the disaster damage. According to one Senior UN Official I recently spoke with, their relief efforts would have overlooked certain disaster-affected areas had it not been for this map.

PHILIPPINES-TYPHOON

Was the data representative? No. Were the underlying images and videos objective? No, they captured the perspective of those taking the pictures. Note that “only” 3% of the world’s population are active Twitter users and fewer still post images and videos online. But the damage captured by this data was not virtual, it was  real damage. And it only takes one person to take a picture of a washed-out bridge to reveal the infrastructure damage caused by a Typhoon, even if all other onlookers have never heard of social media. Moreover, this recent statistical study reveals that tweets are evenly geographically distributed according to the availability of electricity. This is striking given that Twitter has only been around for 7 years compared to the light bulb, which was invented 134 years ago.

•  Big Data enthusiasts suggest doing away with traditional sources of information for disaster response.”

I have yet to meet anyone who earnestly believes this. As Derrick writes, “social media shouldn’t usurp traditional customer service or market research data that’s still useful, nor should the Centers for Disease Control start relying on Google Flu Trends at the expense of traditional flu-tracking methodologies. Web and social data are just one more source of data to factor into decisions, albeit a potentially voluminous and high-velocity one.” In other words, the situation is not either/or, but rather a both/and. Big (Crisis) Data from social media can complement rather than replace traditional information sources and methods.

•  Big Data will make us forget the human faces behind the data.”

Big (Crisis) Data typically refers to user-generated content shared on social media, such as Twitter, Instagram, Youtube, etc. Anyone who follows social media during a disaster would be hard-pressed to forget where this data is coming from, in my opinion. Social media, after all, is social and increasingly visually social as witnessed by the tremendous popularity of Instagram and Youtube during disasters. These help us capture, connect and feel real emotions.

OkeTorn

 

bio

See also: 

  • “No Data is Better than Bad Data…” Really? [Link]
  • Crowdsourcing and the Veil of Ignorance [Link]

The Geography of Twitter: Mapping the Global Heartbeat

My colleague Kalev Leetaru recently co-authored this comprehensive study on the various sources and accuracies of geographic information on Twitter. This is the first detailed study of its kind. The detailed analysis, which runs some 50-pages long, has important implications vis-a-vis the use of social media in emergency management and humanitarian response. Should you not have the time to analyze the comprehensive study, this blog post highlights the most important and relevant findings.

Kalev et al. analyzed 1.5 billion tweets (collected from the Twitter Decahose via GNIP) between October 23 and November 30th, 2012. This came to 14.3 billion words posted by 35% of all active users at the time. Note that 2.9% of the world’s population are active Twitter users and that 87% of all tweets ever posted since the launch of Twitter in 2006 were posted in the past 24 months alone. On average, Kalev and company found that the lowest number of tweets posted per hour is one million; the highest is 2 million. In addition, almost 50% of all tweets are posted by 5% of users. (Click on images to enlarge).

Tweets

In terms of geography, there are two ways to easily capture geographic data from Twitter. The first is from the location information specified by a user when registering for a Twitter account (selected from a drop down menu of place names). The second, which is automatically generated, is from the coordinates of the Twitter user’s location when tweeting, which is typically provided via GPS or cellular triangulation. On a typical day, about 2.7% of Tweets contain GPS or cellular data while 2.02% of users list a place name when registering (1.4% have both). The figure above displays all GPS/cellular coordinates captured from tweets during the 39 days of study. In contrast, the figure below combines all Twitter locations, adding registered place names and GPS/cellular data (both in red), and overlays this with the location of electric lights (blue) based on satellite imagery obtained from NASA.

Tweets / Electricity

White areas depict locations with an equal balance of tweets and electricity. Red areas reveal a higher density of tweets than night lights while blue areas have more night lights than tweets.” Iran and China show substantially fewer tweets than their electricity levels would suggest, reflecting their bans on Twitter, while India shows strong clustering of Twitter usage along the coast and its northern border, even as electricity use is far more balanced throughout the country. Russia shows more electricity usage in its eastern half than Twitter usage, while most countries show far more Twitter usage than electricity would suggest.”

The Pearson correlation between tweets and lights is 0.79, indicating very high similarity. That is, wherever in the world electricity exists, the chances of there also being Twitter users is very high indeed. That is, tweets are evenly distributed geographically according to the availability of electricity. And so, event though “less than three percent of all tweets having geolocation information, this suggests they could be used as a dynamic reference baseline to evaluate the accuracy of other methods of geographic recovery.” Keep in mind that the light bulb was invented 134 years ago in contrast to Twitter’s short 7-year history. And yet, the correlation is already very strong. This is why they call it an information revolution. Still, just 1% of all Twitter users accounted for 66% of all georeferenced tweets during the period of study, which means that relying purely on these tweets may provide a skewed view of the Twitterverse, particularly over short periods of time. But whether this poses a problem ultimately depends on the research question or task at hand.

Twitter table

The linguistic geography of Twitter is critical: “If English is rarely used outside of the United States, or if English tweets have a fundamentally different geographic profile than other languages outside of the United States, this will significantly skew geocoding results.” As the table below reveals, georeferenced tweets with English content constitute 41.57% of all geo-tagged tweets.

Geo Tweets Language

The data from the above table is displayed geographically below for the European region. See the global map here. “In cases where multiple languages are present at the same coordinate, the point is assigned to the most prevalent language at that point and colored accordingly.” Statistical analyses of geo-tagged English tweets compared to all other languages suggests that “English offers a spatial proxy for all languages and that a geocoding algorithm which processes only English will still have strong penetration into areas dominated by other languages (though English tweets may discuss different topics or perspectives).”

Twitter Languages Europe

Another important source of geographic information is a Twitter user’s bio. This public location information was available for 71% of all tweets studied by Kalev and company. Interestingly, “Approximately 78.4 percent of tweets include the user’s time zone in textual format, which offers an approximation of longitude […].” As Kalev et al. note, “Nearly one third of all locations on earth share their name with another location somewhere else on the planet, meaning that a reference to ‘Urbana’ must be disambiguated by a geocoding system to determine which of the 12 cities in the world it might refer to, including 11 cities in the United States with that name.”

There are several ways to get around this challenging, ranging from developing a Full Text Geocoder to using gazetteers such a Wikipedia Gazetteer and MaxFind which machine translation. Applying the latter has revealed that the “textual geographic density of Twitter changes by more than 53 percent over the course of each day. This has enormous ramifications for the use of Twitter as a global monitoring system, as it suggests that the representativeness of geographic tweets changes considerably depending on time of day.” That said, the success of a monitoring system is solely dependent on spatial data. Temporal factors and deviations from a baseline also enable early detection.  In any event, “The small volume of georeferenced tweets can be dramatically enhanced by applying geocoding algorithms to the textual content and metadata of each tweet.”

Kalet et al. also carried out a comprehensive analysis of geo-tagged retweets. They find that “geography plays little role in the location of influential users, with the volume of retweets instead simply being a factor of the total population of tweets originating from that city.” They also calculated that the average geographical distance between two Twitter users “connected” by retweets (RTs) and who geotag their tweets is about 750 miles or 1,200 kilometers. When a Twitter user references another (@), the average geographical distance between the two is 744 miles. This means that RTs and @’s cannot be used for geo-referencing Twitter data, even when coupling this information with time zone data. The figure below depicts the location of users retweeting other users. The geodata for this comes from the geotagged tweets (rather than account information or profile data).

Map of Retweets

On average, about 15.85% of geo-tagged tweets contain links. The most popular links for these include Foursquare, Instagram, Twitter and Facebook. See my previous blog post on the analysis & value of such content for disaster response. In terms of Twitter geography versus that of mainstream news, Kalev et al. analyzed all news items available via Google News during the same period as the tweets they collected. This came to over 3.3 million articles pointing to just under 165,000 locations. The latter are color-coded red in the data ziv below, while Tweets are blue and white areas denote equal balance of both.

Twitter vs News

“Mainstream media appears to have significantly less coverage of Latin America and vastly better greater of Africa. It also covers China and Iran much more strongly, given their bans on Twitter, as well as having enhanced coverage of India and the Western half of the United States. Overall, mainstream media appears to have more even coverage, with less clustering around major cities.” This suggests “there is a strong difference in the geographic profiles of Twitter and mainstream media and that the intensity of discourse mentioning a country does not necessarily match the intensity of discourse emanating from that country in social media. It also suggests that Twitter is not simply a mirror of mainstream media, but rather has a distinct geographic profile […].”

In terms of future growth, “the Middle East and Eastern Europe account for some of Twitter’s largest new growth areas, while Indonesia, Western Europe, Africa, and Central America have high proportions of the world’s most influential Twitter users.”

Bio

See also:

  • Social Media – Pulse of the Planet? [Link]
  • Big Data for Disaster Response – A list of Wrong Assumptions [Link]
  • A Multi-Indicator Approach for Geolocalization of Tweets [Link]

Could CrowdOptic Be Used For Disaster Response?

Crowds—rather than sole individuals—are increasingly bearing witness to disasters large and small. Instagram users, for example, snapped 800,000 #Sandy pictures during the hurricane last year. One way to make sense of this vast volume and velocity of multimedia content—Big Data—during disasters is with PhotoSynth, as blogged here. Another perhaps more sophisticated approach would be to use CrowdOptic, which automatically zeros in on the specific location that eyewitnesses are looking at when using their smartphones to take pictures or recording videos.

Instagram-Hurricane-Sandy

How does it work? CrowdOptic simply triangulates line-of-sight intersections using sensory metadata from pictures and videos taken using a smartphone. The basic approach is depicted in the figure below. The areas of intersection is called a focal cluster. CrowdOptic automatically identifies the location of these clusters.

Cluster

“Once a crowd’s point of focus is determined, any content generated by that point of focus is automatically authenticated, and a relative significance is assigned based on CrowdOptic’s focal data attributes […].” These include: (1) Number of Viewers; (2) Location of Focus; (3) Distance to Epicenter; (4) Cluster Timestamp, Duration; and (5) Cluster Creation, Dissipation Speed.” CrowdOptic can also be used on live streams and archival images & videos. Once a cluster is identified, the best images/videos pointing to this cluster are automatically selected.

Clearly, all this could have important applications for disaster response and information forensics. My colleagues and I recently collected over 12,000 Instagram pictures and more than 5,000 YouTube videos posted to Twitter during the first 48 hours of the Tornado in Oklahoma. These could be uploaded to CrowdOptic for cluster identification. Any focal cluster with several viewers would almost certainly be authentic, particularly if the time-stamps are similar. These clusters could then be tagged by digital humanitarian volunteers based on whether they depict evidence of disaster damage. Indeed, we could have tested out CrowdOptic during in the disaster response efforts we carried out for the United Nations following the devastating Philippines Typhoon. Perhaps CrowdOptic could facilitate rapid damage assessments in the future. Of course, the value of CrowdOptic ultimately depends on the volume of geotagged images and videos shared on social media and the Web.

I once wrote a blog post entitled, “Wag the Dog, or How Falsifying Crowdsourced Data Can Be a Pain.” While an image or video could certainly be falsified, trying to fake several focal clusters of multimedia content with dozens of viewers each would probably require the equivalent organization capacity of a small movie-production or commercial. So I’m in touch with the CrowdOptic team to explore the possibility of carrying out a proof of concept based on the multimedia data we’ve collected following the Oklahoma Tornados. Stay tuned!

bio