Category Archives: Early Warning

Crimson Hexagon: Early Warning 2.0?

The future of automated textual analysis is Crimson Hexagon, a patent pending text reading technology that allows users to define the questions they want to ask, and crawl the blogosphere (or any text-based source) for fast, accurate answers. The technology was created under the aegis of Harvard University Professor Gary King.

I met with the new company’s CEO this week to learn more about the group’s parsing technology and underlying statistical models. Some UN colleagues and I are particularly interested in the technology’s potential application to conflict monitoring and analysis. At present, early warning units within the UN, and other international (regional) organizations such as the OSCE, use manual labor to collect relevant information from online sources. Most units employ full-time staff for this, often meaning that 80% of an analyst’s time is actually used to collect pertinent articles and reports, leaving only 20% of the time for actual analysis, interpretation and policy recommendations. We can do better. Analysts ought to be spending 80% of their time analyzing.

Crimson Hexagon is of course not the first company to carry out automated textual analysis. Virtual Research Associates (VRA) and the EC’s Joint Research Center (JRC) have both been important players in this space. VRA developed GeoMonitor, a natural language parser that reads the headlines of Reuters and AFP news wires and codes “who did what, to who, where and when?” for each event reported by the two media companies. According to an independent review of the VRA parser by Gary King and Will Lowe (2003),

The results are sufficient to warrant a serious reconsideration of the apparent bias against using events data, and especially automatically created events data, in the study of international relations. If events data are to be used at all, there would now seem to be little contest between the machine and human coding methods. With one exception, performance is virtually identical, and that exception (the higher propensity of the machine to find “events” when none exist in news reports) is strongly counterbalanced by both the fact that these false events are not correlated with the degree of conflict of the event category, and by the overwhelming strength of the machine: the ability to code huge numbers of events extremely quickly and inexpensively.

However, as Gary King mentioned in a recent meeting I had with him this month, VRA’s approach faces some important limitations. First, the parser can only parse the headline of each newswire. Second, adding new media sources such as BBC requires significant investment in adjusting the parser. Third, the parser cannot draw on languages other than English.

The JRC has developed the European Media Monitor (EMM). Unlike VRA’s tool, EMM is based on a key-word search algorithm, i.e., it uses a search engine like Google. EMM crawls online news media for key words and places each article into a corresponding category, such as terrorism. The advantage of this approach over VRA’s is that EMM can parse thousands of different news sources, and in different languages. The JRC recently set up an “African Media Monitor” for the African Union’s Continental Early Warning System (CEWS). However, this approach nevertheless faces limitations since analysts still need to read each article to understand the nature of the terrorist event.

Google.org is also pursuing text-based parsing. This initiative stems from Larry Brilliant’s TED 2006 prize to expand the Global Public Health Information Network (GPHIN) for the purposes of prediction and prevention:

Rapid ecological and social changes are increasing the risk of emerging threats, from infectious diseases to drought and other environmental disasters. This initiative will use information and technology to empower communities to predict and prevent emerging threats before they become local, regional, or global crises.

Larry’s idea led to the new non-profit InSTEDD, but last time I spoke with the team, they were not pursuing this initiative. In any case, I wouldn’t be surprised if Google.com were to express an interest in buying out Crimson Hexagon before year’s end. Hexagon’s immediate clients are private sector companies who want to monitor in real-time their brand perception as reported in the blogosphere. The challenge?

115 million blogs, with 120,000 more added each day. As pundits proclaim the death of email, social web content is exploding. Consumers are generating their own media through blogs and comments, social network profiles and interactions, and myriad microcontent publishing tools. How do we begin to know and accurately quantify the relevant opinion that’s out there? How can we get answers to specific questions about online opinion as it relates to a particular topic?

The accuracy and reliability of Crimson Hexagon is truly astounding. Equally remarkable is the fact that the technology developed by Gary King’s group parses every word in a given text. How does the system work? Say we were interested in monitoring the Iranian blogosphere—like the Berkman Center’s recent study. If we were interested in liberal bloggers and their opinion on riots (hypothetically taking place now in Tehran), we would select 10-30 examples of pro-democratic blog entries addressing the ongoing riots. These would then be fed into the system to teach the algorithm about what to look for. A useful analogy that Gary likes to give is speech recognition.

The Crimson Hexagon parser uses a stemming approach, meaning that every word in a given text is reduced to it’s root word. For example, “rioting”, “riots”, “rioters”, etc., is reduced to riot. The technology creates a vector of stem words to characterize each blog entry so that thousands of Iranian blogs can be automatically compared. By providing the algorithm with a sample of 10 or more blogs on, say, positive perceptions of rioting in Tehran were this happening now, the technology would be able to quantify the liberal Iranian bloggers’ changing opinion on the rioting in real time by aggregating the stem vectors.

Crimson Hexagon is truly pioneering a fundamental shift in the paradigm of textual analysis. Instead of trying to find the needle in the haystack as it were, the technology seeks to characterize the hay stack with astonishing reliability such that any changes in the hay stack (amount of hay, density, structure) can be immediately picked up by the parser in real time. Furthermore, the technology can parse any language, say Farsi, just as long as the sample blogs provided are in Farsi. In addition, the system has returned highly reliable results even when using less than 10 samples, and even when the actual blog entry had less than 10 words. Finally, the parser is by no means limited to blog entries, any piece of text will do.

The potential for significantly improving conflict monitoring and analysis is, in my opinion, considerable. Imagine parsing Global Voices in real time, or Reliefweb and weekly situation reports across all field-based agencies world wide. Crimson Hexagon’s CEO immediately saw the potential during our meeting. We therefore hope to carry out a joint pilot study with colleagues of mine at the UN and the Harvard Humanitarian Initiative (HHI). Of course, like any early warning initiative, the link to early response will dictate the ultimate success or failure of this project.

Patrick Phillipe Meier

People-Centered Conflict Early Warning

Conflict early warning works. Indeed, current and historical cases of nonviolent action may be the closest systematic examples or tactical parallels we have to people-centered disaster early warning systems. Planning, preparedness and tactical evasion, in particular, are central components of strategic nonviolence: people must be capable of concealment and dispersion. Getting out of harm’s way and preparing people for the worst effects of violence requires sound intelligence and timely strategic estimates, or situation awareness.

The literature on nonviolent action and civil resistance is rich with case studies on successful instances of early warning tactics for community empowerment. What are the characteristics of successful early warning case studies in the field of nonviolent action? Nonviolent early response uses local social networks as the organizational template of choice, in a mode different from our conventional and institutional approach to early warning. Networks have demonstrated a better ability to innovate tactically and learn from past mistakes. The incentives for members of local networks to respond early and get out of harm’s way are also incalculably higher than those at the institutional or international level since failure to do so in the former instance often means death.

Nonviolent action is non-institutional and operates outside the bounds of bureaucratic and institutionalized political channels. Nonviolent movements are locally led and managed. They draw on local meaning, culture, symbolism and history. They integrate local knowledge and the intimate familiarity with the geography and surrounding environment. They are qualitative and tactical, not quantitative and policy-oriented. Not surprisingly, successful cases of nonviolent action clearly reveal the pivotal importance of contingency planning and preparedness, actions that are particularly successful when embedded in local circumstances and local experience.

The iRevolution question is how social resistance groups can most effectively use ICTs to gain an asymmetric advantage over repressive regimes.

Patrick Philippe Meier

Conflict Early Warning Systems: No iRevolution

Convention conflict early warning systems are designed by us in the West to warn ourselves. They are about control. These systems are centralized, hierarchical, bureaucratic and ineffective. And highly academic. Indeed, the vast majority of operational conflict early warning systems are little more than fancy databases used to store, retrieve and analyze data. The rhetoric is that these systems serve to prevent violence which is rather ironic since the vast majority of local communities at risk have never heard of our impressive sounding systems.

Lessons in this field are clearly not learned. Papers published by Rupensinghe (1988) and Walker (1992) could be published tomorrow with no changes and their recommendations would still be on target. Worst of all, the indicator of success for early warning systems is still the number of high-quality analytical reports produced.

Reports don’t protect people, nor do graphs. People protect themselves and others. And yet reports still get written albeit rarely read let alone ever acted upon. To be fair, however, those working on conventional early warning systems are constrained by political and institutional realities. The best that these systems can do is to build a paper trail of analysis and recommendations. In other words, convention early warning systems can be used for advocacy and lobbying, but to assume that they are appropriate for operational response is to be misguided (see Campbell and Meier 2007). Indeed the recent study by Susanna Campbell and myself showed that decision-making structures at the UN do not use analyses generated by formal early warning systems as input into the decision-making process.

In order for conventional early warning systems to engage in operational response, they would first require the paper trail, which would then be used to lobby the UN Secretariat and other member states, these actors would then have to place political and economic pressure on offending governments and/or non-state armed groups, and the latter would have to acquiesce. Now, exactly how often has this been successful? Exactly. The above process takes years and fails repeatedly.

It is high time we learn from other communities such as disaster management. The disaster community places increasing emphasis on the importance of people-centered early warning and response systems. They define the purpose of early warning as follows:

To empower individuals and communities threatened by hazards to act in sufficient time and in an appropriate manner so as to reduce the possibility of personal injury, loss of life, damage to property and the environment, and loss of livelihoods.

The day our conflict early warning community adopts this discourse will be a good day. I hope to still be around to toast the breakthrough. Clearly, the discourse in disaster management shifts away from the conventional top-down division of labor between the “warners” and “responders” to one of individual empowerment. In disaster management, this means capacity building by training in preparedness and contingency planning. In other words, the disaster management community focuses on both forecasting hazards and mitigating their impact when they turn into disasters.

Question: Why are we in the conflict early warning community obsessed with forecasting despite our dismal track record? The disaster community is better able to forecast than we are, yet they allocate significant resources towards community-based preparedness and contingency planning programs. So when disaster does strike, the communities (who are by definition the first responders) can manage their own security environment without the immediate need for external intervention. There would be an uproar (and an escalation in disaster deaths) if the disaster community were to focus solely on prediction.

And what do we do? We work in conflict prone places and set up conflict early warning systems. When the violence escalates, we evacuate all international staff and leave the local communities behind to face the violence by themselves. How often do our conflict early warning systems fail? As often as we evacuate our staff. At the very, very least we should be preparing at-risk communities for the violence and engaging them in contingency planning so that when violence does strike, they at least have the training to get out of harm’s way and survive.

In a future blog, I will write about how some at-risk communities already do get of harm’s way, and effectively so.

Patrick Philippe Meier