Analyzing Tweets on Malaysia Flight #MH370

My QCRI colleague Dr. Imran is using our AIDR platform (Artificial Intelligence for Disaster Response) to collect & analyze tweets related to Malaysia Flight 370 that went missing several days ago. He has collected well over 850,000 English-language tweets since March 11th; using the following keywords/hashtags: Malaysia Airlines flight, #MH370m #PrayForMH370 and #MalaysiaAirlines.

MH370 Prayers

Imran then used AIDR to create a number of “machine learning classifiers” to automatically classify all incoming tweets into categories that he is interested in:

  • Informative: tweets that relay breaking news, useful info, etc

  • Praying: tweets that are related to prayers and faith

  • Personal: tweets that express personal opinions

The process is super simple. All he does is tag several dozen incoming tweets into their respective categories. This teaches AIDR what an “Informative” tweet should “look like”. Since our novel approach combines human intelligence with artificial intelligence, AIDR is typically far more accurate at capturing relevant tweets than Twitter’s keyword search.

And the more tweets that Imran tags, the more accurate AIDR gets. At present, AIDR can auto-classify ~500 tweets per second, or 30,000 tweets per minute. This is well above the highest velocity of crisis tweets recorded thus far—16,000 tweets/minute during Hurricane Sandy.

The graph below depicts the number of tweets generated since the day we started collecting the AIDR collection, i.e., March 11th.

Volume of Tweets per Day

This series of pie charts simply reflects the relative share of tweets per category over the past four days.

Tweets Trends

Below are some of the tweets that AIDR has automatically classified as being Informative (click to enlarge). The “Confidence” score simply reflects how confident AIDR is that it has correctly auto-classified a tweet. Note that Imran could also have crowdsourced the manual tagging—that is, he could have crowdsourced the process of teaching AIDR. To learn more about how AIDR works, please see this short overview and this research paper (PDF).

AIDR output

If you’re interested in testing AIDR (still very much under development) and/or would like the Tweet ID’s for the 850,000+ tweets we’ve collected using AIDR, then feel free to contact me. In the meantime, we’ll start a classifier that auto-collects tweets related to hijacking, criminal causes, and so on. If you’d like us to create a classifier for a different topic, let us know—but we can’t make any promises since we’re working on an important project deadline. When we’re further along with the development of AIDR, anyone will be able to easily collect & download tweets and create & share their own classifiers for events related to humanitarian issues.

Bio

Acknowledgements: Many thanks to Imran for collecting and classifying the tweets. Imran also shared the graphs and tabular output that appears above.

7 responses to “Analyzing Tweets on Malaysia Flight #MH370

  1. Pingback: Results of the Crowdsourced Search for Malaysia Flight 370 | iRevolution

  2. Pingback: Crowdsourcing the Search for Malaysia Flight 370 (Updated) | iRevolution

  3. john sokiri pitia

    sorry for what has happen.And the lives of who have lost cant be forgotten

  4. How many (remote?) airports in the Maldives are good enough to land a 777? Seems like the perfect remote spot for landing a plane! Torsten

  5. Pingback: Using AIDR to Collect and Analyze Tweets from Chile Earthquake | iRevolution

  6. Pingback: Drawn Out Search for Malaysia Airlines F370 | Wikipedia and Public Knowledge 2014

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s