My colleague Andrea Tapia and her team at PennState University have developed an interesting iPhone application designed to support humanitarian response. This application is part of their EMERSE project: Enhanced Messaging for the Emergency Response Sector. The other components of EMERSE include a Twitter crawler, automatic classification and machine learning.
The rationale for this important, applied research? “Social media used around crises involves self-organizing behavior that can produce accurate results, often in advance of official communications. This allows affected population to send tweets or text messages, and hence, make them heard. The ability to classify tweets and text messages automatically, together with the ability to deliver the relevant information to the appropriate personnel are essential for enabling the personnel to timely and efficiently work to address the most urgent needs, and to understand the emergency situation better” (Caragea et al., 2011).
The iPhone application developed by PennState is designed to help humanitarian professionals collect information during a crisis. “In case of no service or Internet access, the application rolls over to local storage until access is available. However, the GPS still works via satellite and is able to geo-locate data being recorded.” The Twitter crawler component captures tweets referring to specific keywords “within a seven-day period as well as tweets that have been posted by specific users. Each API call returns at most 1000 tweets and auxiliary metadata […].” The machine translation component uses Google Language API.
The more challenging aspect of EMERSE, however, is the automatic classification component. So the team made use of the Ushahidi Haiti data, which includes some 3,500 reports about half of which came from text messages. Each of these reports were tagged according to a specific (but not mutually exclusive category), e.g., Medical Emergency, Collapsed Structure, Shelter Needed, etc. The team at PennState experimented with various techniques from (NLP) and Machine Learning (ML) to automatically classify the Ushahidi Haiti data according to these pre-existing categories. The results demonstrate that “Feature Extraction” significantly outperforms other methods while Support Vector Machine (SVM) classifiers vary significantly depending on the category being coded. I wonder whether their approach is more or less effective than this one developed by the University of Colorado at Boulder.
In any event, PennState’s applied research was presented at the ISCRAM 2011 conference and the findings are written up in this paper (PDF): “Classifying Text Messages for the Haiti Earthquake.” The co-authors: Cornelia Caragea, Nathan McNeese, Anuj Jaiswal, Greg Traylor, Hyun-Woo Kim, Prasenjit Mitra, Dinghao Wu, Andrea H. Tapia, Lee Giles, Bernard J. Jansen, John Yen.
In conclusion, the team at PennState argue that the EMERSE system offers four important benefits not provided by Ushahidi.
“First, EMERSE will automatically classify tweets and text messages into topic, whereas Ushahidi collects reports with broad category information provided by the reporter. Second, EMERSE will also automatically geo-locate tweets and text messages, whereas Ushahidi relies on the reporter to provide the geo-location information. Third, in EMERSE, tweets and text messages are aggregated by topic and region to better understand how the needs of Haiti differ by regions and how they change over time. The automatic aggregation also helps to verify reports. A large number of similar reports by different people are more likely to be true. Finally, EMERSE will provide tweet broadcast and GeoRSS subscription by topics or region, whereas Ushahidi only allows reports to be downloaded.”
In terms of future research, the team may explore other types of abstraction based on semantically related words, and may also “design an emergency response ontology […].” So I recently got in touch with Andrea to get an update on this since their ISCRAM paper was published 14 months ago. I’ll be sure to share any update if this information can be made public.
The issue here is to forget that many other users own other smartphone run by Linux distributions 🙂
The focus of the post is not the iPhone but the automated classifier which is completely independent from the iPhone component.
A most interesting post about brilliant technology, thank you for that.
Do you know any previous efforts to establish an emergency response ontology? This seems like a potential breakthrough, if it can be achieved and widely adopted. I have recently come across a mention of Ushahidi-developed disaster reporting ontology (http://kauppinen.net/tomi/lod-crowdsourcing-2011.pdf), but haven’t managed to explore it yet – could you possibly link me to appropriate resource?
I am currently affiliated with an independent NGO maintaining an unstructured database of keywords that we use to tag our online content. In my spare time I am trying to explore possibilities of providing a structure to this vocabulary and have been searching for existing humanitarian response ontologies, ideally ones we could connect to via linked data approach. Have you come across any mention of similar ideas before?
Thanks for your kind note, Lukasz. I’m just myself starting to get into the emergency response ontology space so still learning a lot. I can put you in touch with my colleague Minu who co-authored the paper you linked to (simply send me an email with what you are looking and I’ll cc him in my reply).
In the meantime, I would recommend we set up a resource on the CrisisMappers website so we can start pooling our efforts together and list all the ontologies that we have come across. What do you think?
I’d be more than interested. I’m just starting as well and this is currently very much of a spare time idea but enthusiasm is there. If I’m not overwhelmed by technical complexities I might try to use it as a basis for a PhD topic. You’d probably know best how to start with the crisismappers resource.
I’m quite sure you know about HXL – they’ve put together a list of resources used for coming up with a highly interoperable geographic data model: http://goo.gl/BLTpK
Minu and I studied together, so I can just pass on greetings 🙂
Hey Lukasz, how about starting with a Google Doc or Google Spreadsheet. We could start working on that and then invite a few knowledgeable colleagues to contribute. That way we’ve got the beginning of a resource before we even go public and crowdsource more input. What do you think?
Say hi to Minu for me! 🙂