The field of sentiment analysis is one that I’ve long been interested in. See my previous post on the use of sentiment analysis for early warning here. So when we began receiving thousands of text messages from Haiti, I decided to ask my colleagues at the EC’s Joint Research Center (JRC) whether they could run some of their sentiment analysis software on the incoming SMS’s.
The 4636 SMS initiative in Haiti was a collaboration between many organizations and was coordinated by Josh Nesbit of FrontlineSMS. The system allowed individuals in Haiti to text in their location and urgent needs. These would then be shared with some of the humanitarian actors on the ground and also mapped on the Ushahidi-Haiti platform, which was used by first responders such as the Marine Corps.
Here’s how the JRC in partnership with the University of Alicante carried out their analysis on the incoming SMS’s:
As many individual words are ambiguous (e.g. the word ‘help’ probably predominantly indicates a negative situation, but it may also be positive, as in “help has finally arrived”), they looked at the most frequent word groups, or word n-grams (sizes 2 to 5 words). Out of these, they identified about 100 n-grams that they felt are (high) negative or (high) positive. These were added to the sentiment analysis tool.
The graph below depicts the changing sentiment reflected in the SMS data between January 17th and February 5th.
There is, of course, no way to tell whether the incoming text messages reflect the general feeling of the population. It is also important to emphasize that the number of individuals sending in SMS’s increased during this time period. Still, it would be interesting to go through the sentiment analysis data and identify what may have contributed to the peaks and troughs of the above graph.
Incidentally, the lowest point on this graph is associated with the date of January 21. The data reveals that a major aftershock took place that day. There are subsequent reports of trauma, food/water shortages, casualties, need for medication, etc., which drive the sentiment analysis scores down.
Update 1: My colleague Ralf Steinberger and the Ushahidi-Haiti group is looking into the reasons behind the spike around January 30th. Ralf notes the following:
I checked the news a bit, using the calendar function in EMM NewsExplorer (http://emm.newsexplorer.eu/). I checked both the English and the French news for the day. One certainly positive news item accessible to Haitians on that day was that Haiti leaders pointed to progress. Another (French) positive news item is that the WFP (PAM) put in place a structured food aid system aiming at feeding up to 2 million people via women only. People were given food coupons (25kg of rice per family), starting Saturday 30.1.
Ralf also found that many of the original SMS’s received on that day had not been translated into English. So we’re looking into why that might have been. Hopefully we can get them translated retro-actively for the purposes of this analysis.
Update 2: Josef Steinberger from JRC has produced a revised sentiment analysis graph through to mid March.
This kind of sentiment analysis can be done in real-time. In future deployments where SMS becomes the principle source to communicate with disaster affected populations, using this kind of approach may eventually provide an overall score for how the humanitarian community is doing.
Interesting. I assume that sentiment analysis is keyed to specific languages, is that correct?
What language was used in this analysis? Judging from the word “help”, it would seem to be English. In the case of Haiti, this in itself would insert considerable bias of course.
Thanks Sebastian, the majority of incoming SMS’s were in Creole and French; then translated into English.
Any plans to move beyond general sentiment to something topic based? Like sentiment about the aid work could be much different than sentiment about the earthquake itself. Also, is this data set being made public?
we (JRC) did actually tune our generic sentiment vocabulary to this specific Haiti SMS domain.
In the general news context (outside the domain of Haiti text messages), we aim at separating bad news from negative sentiment as it is not meaningful to state that the sentiment in news about natural disasters, for instance, (and the persons and organisations mentioned in these news items!) is negative. Furthermore, we aim at detecting sentiment towards any ‘entity’ (person, organisation, event, program) rather than generic document sentiment. Read this if you want to know more: http://langtech.jrc.ec.europa.eu/Documents/09_WOMSA-WS-Sevilla_Sentiment-Def_printed.pdf . Experiments (to be published in May) have confirmed that – by doing this – the agreement between human and automatic judgement rises significantly.
Fascinating! My head is reeling with possibilities. I wonder if it could be used in real-time in Ushahidi’s swift river project as an additional check on validity (comparing the sentiment of a report to trends to flag inconsistencies). Or could analysis be used in prioritizing emergency aid delivery by identifying the neediest/most desperate areas? Lots of food for thought here.
Thanks Jesse, really good idea re complementing Swift River!
Pingback: Humanitarianism in the Network Age: Groundbreaking Study | iRevolution