Tag Archives: MH370

Analyzing Tweets on Malaysia Flight #MH370

My QCRI colleague Dr. Imran is using our AIDR platform (Artificial Intelligence for Disaster Response) to collect & analyze tweets related to Malaysia Flight 370 that went missing several days ago. He has collected well over 850,000 English-language tweets since March 11th; using the following keywords/hashtags: Malaysia Airlines flight, #MH370m #PrayForMH370 and #MalaysiaAirlines.

MH370 Prayers

Imran then used AIDR to create a number of “machine learning classifiers” to automatically classify all incoming tweets into categories that he is interested in:

  • Informative: tweets that relay breaking news, useful info, etc

  • Praying: tweets that are related to prayers and faith

  • Personal: tweets that express personal opinions

The process is super simple. All he does is tag several dozen incoming tweets into their respective categories. This teaches AIDR what an “Informative” tweet should “look like”. Since our novel approach combines human intelligence with artificial intelligence, AIDR is typically far more accurate at capturing relevant tweets than Twitter’s keyword search.

And the more tweets that Imran tags, the more accurate AIDR gets. At present, AIDR can auto-classify ~500 tweets per second, or 30,000 tweets per minute. This is well above the highest velocity of crisis tweets recorded thus far—16,000 tweets/minute during Hurricane Sandy.

The graph below depicts the number of tweets generated since the day we started collecting the AIDR collection, i.e., March 11th.

Volume of Tweets per Day

This series of pie charts simply reflects the relative share of tweets per category over the past four days.

Tweets Trends

Below are some of the tweets that AIDR has automatically classified as being Informative (click to enlarge). The “Confidence” score simply reflects how confident AIDR is that it has correctly auto-classified a tweet. Note that Imran could also have crowdsourced the manual tagging—that is, he could have crowdsourced the process of teaching AIDR. To learn more about how AIDR works, please see this short overview and this research paper (PDF).

AIDR output

If you’re interested in testing AIDR (still very much under development) and/or would like the Tweet ID’s for the 850,000+ tweets we’ve collected using AIDR, then feel free to contact me. In the meantime, we’ll start a classifier that auto-collects tweets related to hijacking, criminal causes, and so on. If you’d like us to create a classifier for a different topic, let us know—but we can’t make any promises since we’re working on an important project deadline. When we’re further along with the development of AIDR, anyone will be able to easily collect & download tweets and create & share their own classifiers for events related to humanitarian issues.

Bio

Acknowledgements: Many thanks to Imran for collecting and classifying the tweets. Imran also shared the graphs and tabular output that appears above.

Results of the Crowdsourced Search for Malaysia Flight 370 (Updated)

Update: More than 3 million volunteers thus far have joined the crowdsourcing efforts to locate the missing Malaysian Airlines plane. These digital volunteers have viewed over a quarter-of-a-billion micro-maps and have tagged almost 3 million features in these satellite maps. Source of update.

Malaysian authorities have now gone on record to confirm that Flight 370 was hijacked, which reportedly explains why contact with the passenger jet abruptly ceased a week ago. The Search & Rescue operations now involve 13 countries around the world and over 100 ships, helicopters and airplanes. The costs of this massive operation must easily be running into the millions of dollars.

FlightSaR

Meanwhile, a free crowdsourcing platform once used by digital volunteers to search for Genghis Khan’s Tomb and displaced populations in Somalia (video below) has been deployed to search high-resolution satellite imagery for signs of the missing airliner. This is not the first time that crowdsourced satellite imagery analysis has been used to find a missing plane but this is certainly the highest profile operation yet, which may explain why the crowdsourcing platform used for the search (Tomnod) reportedly crashed for over a dozen of hours since the online search began. (Note that Zooniverse can easily handle this level of traffic). Click on the video below to learn more about the crowdsourced search for Genghis Khan and displaced peoples in Somalia.

NatGeoVideo

Having current, high-resolution satellite imagery is almost as good as having your own helicopter. So the digital version of these search operations includes tens of thousands of digital helicopters, whose virtual pilots are covering over 2,000 square miles of Thailand’s Gulf right from their own computers. They’re doing this entirely for free, around the clock and across multiple time zones. This is what Digital Humanitarians have been doing ever since the 2010 Haiti Earthquake, and most recently in response to Typhoon Yolanda.

Tomnod has just released the top results of the crowdsourced digital search efforts, which are displayed in the short video below. Like other microtasking platforms, Tomnod uses triangulation to calculate areas of greatest consensus by the crowd. This is explained further here. Note: The example shown in the video is NOT a picture of Flight 370 but perhaps of an airborne Search & Rescue plane.

While looking for evidence of the missing airliner is like looking for the proverbial needle in a massive stack of satellite images, perhaps the biggest value-added of this digital search lays in identifying where the aircraft is most definitely not located—that is, approaching this crowdsourced operation as a process of elimination. Professional imagery analysts can very easily and quickly review images tagged by the crowd, even if they are mistakenly tagged as depicting wreckage. In other words, the crowd can provide the first level filter so that expert analysts don’t waste their time looking at thousands of images of bare oceans. Basically, if the mandate is to leave no stone unturned, then the crowd can do that very well.

In sum, crowdsourcing can reduce the signal to noise ratio so that experts can focus more narrowly on analyzing the potential signals. This process may not be perfect just yet but it can be refined and improved. (Note that professionals also get it wrong, like Chinese analysts did with this satellite image of the supposed Malaysian airliner).

If these digital efforts continue and Flight 370 has indeed been hijacked, then this will certainly be the first time that crowdsourced satellite imagery analysis is used to find a hijacked aircraft. The latest satellite imagery uploaded by Tomnod is no longer focused on bodies of water but rather land. The blue strips below (left) is the area that the new satellite imagery covers.

Tomnod New Imagery 2

Some important questions will need to be addressed if this operation is indeed extended. What if the hijackers make contact and order the cessation of all offline and online Search & Rescue operations? Would volunteers be considered “digital combatants,” potentially embroiled in political conflict in which the lives of 227 hostages are at stake?

bio

Note: The Google Earth containing the top results of the search is available here.

See also: Analyzing Tweets on Malaysia Flight #MH370 [link]