Steganography 2.0: Digital Resistance against Repressive Regimes

A team of Polish steganographers at the Institute of Telecommunications in Warsaw are doing some neat work that should be of interest to digital activists. Steganography is is the art and science of writing hidden messages in such a way that no one, apart from the sender and intended recipient, suspects the existence of the message, a form of security through obscurity.

Wojciech Mazurczyk, along with Krzysztof Szczypiorski and Milosz Smolarczyk are using the Internet’s transmission control protocol (TCP) to create fake web traffic that can mask the transmission of secret messages.

As the NewScientist explains,

“Web, file transfer, email and peer-to-peer networks all use TCP, which ensures that data packets are received securely by making the sender wait until the receiver returns a “got it” message. If no such acknowledgement arrives (on average 1 in 1000 packets gets lost or corrupted), the sender’s computer sends the packet again. This scheme is known as TCP’s retransmission mechanism – and it can be bent to the steganographer’s whim, says Mazurczyk.”

The team’s project is called Retransmission Steganography, or RSTEG, proposes to use software that deliberately asks the receiver of information to prompt a retransmission from the sender even when the data was successfully received in the first place. As Mazurczyk explains, “the sender then retransmits the packet but with some secret data inserted in it,” which means, “the message is hidden among the teeming network traffic.”

The use of RSTEG as a tactic for digital resistance could be quite effective. While eavesdroppers could monitor the fact that a first sent package is different from a second retransmitted one containing the secret message, this would be somewhat useless since all retransmitted packages differ from original ones anyway. In other words, “Retransmissions in IP networks are a ‘natural phenomenon’, and so intentional retransmissions introduced by RSTEG are not easy to detect if they are kept at a reasonable level.”

Mazurczyk and Szczypiorski are also working on a parallel project that draws on steganographic techniques to creating covert channels for Voice over Internet Protocol (VOIP) streams. This approach, called Lost Audio Packets Steganography, or LACK, “provides hybrid storage-timing covert channel by utilizing delayed audio packets.”

For more information on the technical specifications of the RSTEG and LACK techniques, please see the authors’ papers here and here respectively.

The team plans to demonstrate their approach at a workshop on network steganography in China later this year. Yes, China.

Patrick Philippe Meier

GeoSurveillance for Crisis Mapping Analytics

Having blogged at length on the rationale for Crisis Mapping Analytics (CMA), I am now interested in assessing the applicability of existing tools for crisis mapping vis-a-vis complex humanitarian emergencies.

In this blog post, I review an open-source software package called GeoSurveillance that combines spatial statistical techniques and GIS routines to perform tests for the detection and monitoring of spatial clustering.

The post is based on the new peer-reviewed article “GeoSurveillance: a GIS-based system for the detection and monitoring of spatial clusters” published in the Journal of Geographical Systems and authored by Ikuho Yamada, Peter Rogerson and Gyoungju Lee.

Introduction

The detection of spatial clusters—testing the null hypothesis of spatial randomness—is a key focus of spatial analysis. My first research project in this area dates back to 1996, when I wrote a software algorithm in C++ to determine the randomness (or non-randomness) of stellar distributions.

stars

The program would read a graphics file of a high-quality black-and-white image of a stellar distribution (that I had scanned from a rather expensive book) and run a pattern analysis procedure to determine what constituted a star and then detect them. Note that the stars were of various sizes and resolutions, with many overlapping in part.

Once the stars were detected, I manually approximated the number of stars in the stellar distributions to evaluate the reliability of my algorithm. The program would then assign (x, y) coordinates to each star. I compared this series of numbers with a series of pseudo-random numbers that I generated independently.

Using the Kolmogorov-Smirnov test in two-dimensions, I could then test the probability that the series of (x, y) coordinates pseudo-random numbers were samples that came from the same set.

Retrospective vs Prospective Analysis

This type of spatial cluster analysis on stellar distributions is retrospective and the majority of methods developed to date belong to this class of tests.

The other class of spatial cluster detection is called prospective testing. This testing is designed for time-series data that is updated over time and test statistics are computed when new data becomes available. “While retrospective tests focus on a static aspect of spatial patterns, prospective tests take into account their dynamic nature and attempt to find new, emergent clusters as quickly as possible.”

There has been a surge of interest in this prospective approach following the anthrax attacks of 2001 and the perceived threat of bioterrorism since. But as the authors of the GeoSurveillance study note, prospective monitoring approaches have broader application, “including the detection of outbreaks of food poisoning and infectious diseases and the detection of emergent crime hotspots.” And I would add crisis mapping for complex humanitarian emergencies.

Very little work has been done using retrospective analysis for crisis mapping and even less using prospective techniques. Both are equally important. The former is critical if we want to have a basis (and indeed baseline) to know what deviations and patterns to look for. The former is important since as humanitarian practitioners and policy makers, we are interested in operational conflict prevention.

Spatial Analysis Software

While several GIS software packages provide functionalities for retrospective analysis of spatial patterns, “few provide for prospective analysis,” with the notable exception of SaTScan, which enables both applications. SaTScan does has two drawbacks, however.

The first is that “prospective analysis in SaTScan is not adjusted in a statistically rigorous manner for repeated time-periodic tests conducted as new data become available.” Secondly, the platform “does not offer any GIS functionality for quick visual assessment of detected clusters.”

What is needed is a platform that provides a convenient graphical user-interface (GUI) that allows users to identify spatial clusters both statistically and visually. GeoSurveillance seeks to do just this.

Introducing GeoSurveillance

This spatial analysis software consists of three components: a cluster detection and monitoring component, a GIS component and a support tool component as depicted below.

GeoSurveillance

  • “The cluster detection and monitoring component is further divided into retrospective and prospective analysis tools, each of which has a corresponding user-interface where parameters and options for the analysis are to be set. When the analysis is completed, the user-interfaces also provide a textual and/or graphical summary of results.”
  • “The GIS component generates map representation of the results, where basic GIS functionalities such as zoom in/out, pan, and identify are available. For prospective analysis, the resulting map representation is updated every time a statistical computation for a time unit is completed so that spatial patterns changing over time can be visually assessed as animation.”
  • “The support tool component provides various auxiliary tools for user.”

The table below presents a summary (albeit not exhaustive) of statistical tests for cluster detection. The methods labeled in bold are currently available within GeoSurveillance.

GeoSurveillance2

GeoSurveillance uses the local score statistic for retrospective analysis and applies the univariate cumulative sum (cusum) method. Cusum methods are familiar to public health professionals since they are often applied to public health monitoring.

Both methods are somewhat involved mathematically speaking so I won’t elaborate on them here. Suffice it to say that the complexity of spatial analysis techniques needs to be “hidden” from the average user if this kind of platform is to be used by humanitarian practitioners in the field.

Applying GeoSurveillance

The authors Yamada et. al used the platform to carry out a particularly interesting study of low birth weight (LBW) incidence data in Los Angeles, California.

Traditional studies “on LBW have focused on individual-level risk factors such as race/ethnicity, maternal age, maternal education, use of prenatal care, smoking and other substance abuse during pregnancy.” However, such individual factors have had little ability to explain the risk of LBW. To this end, “increasing attention has been directed to neighborhood-level risk factors including […] racial/ethnic composition, economic status, crime rate, and population growth trend.”

The authors of the GeoSurveillance study thus hypothesize that “the risk of LBW incidence and its change over time have non-random spatial patterns reflecting background distributions of neighborhood-level risk factors.” The results of the retrospective and prospective analysis using GeoSurveillance is available both in tabular and map formats. The latter format is displayed and interpreted below.

GeoSurveillance3

Using GeoSurveillance’s retrospective analysis functionality enable the authors to automatically detect high risk areas of LWB (marked in red) as well as the zone with the highest abnormal incidents of LBW (marked in yellow). The maps above indicate that a large concentration of neighborhoods with high risk of LBW are found “near downtown Los Angeles extending toward the northwest, and three smaller ones in the eastern part of the county.”

GeoSurveillance4

Carrying out prospective analysis on the LWB data enabled the authors to conclude that high the risk of LBW “used to be concentrated in particular parts of the county but is now more broadly spread throughout the county.” This result now provides the basis for further investigation to “identify individual- and neighborhood-level factors that relate to this change in the spatial distribution of the LBW risk.”

Conclusion

The developers of GeoSurveillance plan to implement more methods in the next version, especially for prospective analysis given the limited availability of such methods in other GIS software. The GeoSurveillance software as well as associated documentation and sample datasets can be downloaded here.

I have downloaded the software myself and will start experimenting shortly with some Ushahidi and/or PRIO data if possible. Stay tuned for an update.

Patrick Philippe Meier

Ushahidi for Mobile Banking

I just participated in a high-level mobile banking (mBanking) conference in Nairobi, which I co-organized with colleagues from The Fletcher School.

Participants included the Governor of Kenya’s Central Bank, Kenya’s Finance Minister, the directors/CEO’s of Safaricom, Equity Bank, Bankable Frontier Associates, Iris Wireless, etc, and senior representatives from the Central Banks of Tanzania, Rwanda and Burundi as well as CGAP, Google, DAI, etc.

mBanking1

The conference blog is available here and the Twitter feed I set up is here. The extensive work that went into organizing this international conference explains my relative absence from iRevolution; that and my three days off the grid in Lamu with Fletcher colleagues and Erik Hersman.

I have already blogged about mBanking here so thought I’d combine  my interest in the subject with my ongoing work with Ushahidi.

One of the issues that keeps cropping up when discussing mBanking (and branchless banking) is the challenge of agent reliability and customer service. How does one ensure the trustworthiness of a growing network of agents and simultaneously handle customer complaints?

A number of speakers at Fletcher’s recent conference highlighted these challenges and warned they would become more pressing with time. So this got me thinking about an Ushahidi-for-mBanking platform.

Since mBanking customers by definition own a mobile phone, a service like M-Pesa or Zap could provide customers with a dedicated short code which they could use to text in concerns or report complaints along with location information. These messages could then be mapped in quasi real-time on an Ushahidi platform. This would provide companies like Safaricom and Zain with a crowdsourced approach to monitoring their growing agent network.

A basic spatial analysis of these customer reports over time would enable Safaricom and Zain to identify trends in customer complaints. The geo-referenced data could also provide the companies with a way to monitor agent-reliability by location. Safaricom could then offer incentives to M-Pesa agents to improve agent compliance and reward them accordingly.

In other words, the “balance of power” would shift from the agent to the customer since the latter would now be in position to report on quality of service.

But why wait for Safaricom and Zain to kick this off? Why not simply launch two public parallel platforms, one for M-Pesa and the other for Zap to determine which of the two companies receive more complaints and how quickly they respond to them?

To make the sites sustainable, one could easily come up with a number of business plan models. One idea might be to provide advertising space on the Ushahidi-mBanking site. In addition, the platform would provide a way to collect the mobile phone numbers of individual clients; this information could then be used to broadcast ads-by-SMS on a weekly basis, for example.

If successful, this approach could be replicated with Wizzit and MTN in South Africa and gCash in the Philippines. I wish I had several more weeks in Nairobi to spearhead this but I’m heading back to the Sudan to continue my consulting work with the UN’s Threat and Risk Mapping Analysis (TRMA).

Patrick Philippe Meier

Disaster Theory for Techies

I’ve had a number of conversations over the past few weeks on the delineation between pre- and post-disaster phases. We need to move away from this linear concept of disasters, and conflicts as well for that matter. So here’s a quick introduction to “disaster theory” that goes beyond what you’ll find in the mainstream, more orthodox literature.

What is a Disaster?

There is a subtle but fundamental difference between disasters (processes) and hazards (events); a distinction that Jean-Jacques Rousseau first articulated in 1755 when Portugal was shaken by an earthquake. In a letter to Voltaire one year later, Rousseau notes that, “nature had not built  [process] the houses which collapsed and suggested that Lisbon’s high population density [process] contributed to the toll” (1).

(Incidentally, the earthquake in Portugal triggered extensive earthquake research in Europe and also served as the focus for various publications, ranging from Kant’s essays about the causes of earthquakes to Voltaire’s Poème sur le désastre de Lisbonne).

In other words, natural events are hazards and exogenous while disasters are the result of endogenous social processes. As Rousseau added in his note to Voltaire, “an earthquake occurring in wilderness would not be important to society” (2). That is, a hazard need not turn to disaster since the latter is strictly a product of social processes.

And so, while disasters were traditionally perceived as “sudden and short lived events, there is now a tendency to look upon disasters in African countries in particular, as continuous processes of gradual deterioration and growing vulnerability,” which has important “implications on the way the response to disasters ought to be made” (3).

But before we turn to the issue of response, what does the important distinction between events and processes mean for early warning?

Blast From the Past

In The Poverty of Historicism (1944), the German Philosopher Karl Popper makes a distinction between two kinds of predictions: “We may predict (a) the coming of a typhoon [event], a prediction which may be of the greatest practical value because it may enable people to take shelter in time; but we may also predict (b) that if a certain shelter is to stand up to a typhoon, it must be constructed [process] in a certain way […].”

A typhoon, like an  earthquake, is certainly a hazard, but it need not lead to disaster if shelters are appropriately built since this process culminates in minimizing social vulnerability.

In contemporary disaster research, “it is generally accepted among environmental geographers that there is no such thing as a natural disaster. In every phase and aspect of a disaster—causes, vulnerability, preparedness, results and response, and reconstruction—the contours of disaster and the difference between who lives and  dies is to a greater or lesser extent a social calculus” (4).

In other words, the term “natural disaster” is an oxymoron and “phrases such as a ‘disaster hit the city,’ ‘tornadoes kill and destroy,’ or a ‘catastrophe is known by its works’ are, in the last resort, animistic thinking” (5).

The vulnerability or resilience of a given system is not simply dependent on the outcome of future events since vulnerability is the complex product of past political, economic and social processes. When hazards such as landslides interface with social systems the risk of disasters may increase. “The role of vulnerability as a causal factor in disaster losses tends to be less well understood, however. The idea that disasters can be managed by identifying and managing specific risk factors is only recently becoming widely recognized” (6).

A Complex System

Consider an hourglass or sand clock as an illustration of vulnerability-as-causality. Grains of sand sifting through the narrowest point of the hourglass represent individual events or natural hazards. Over time a sand pile starts to form, which represents the evolution of society or the connectedness of a social network. Occasionally, a grain of sand falls on the pile and an avalanche or disaster follows.

Why does the avalanche occur? One might ascribe the cause of the avalanche to one grain of sand, i.e., a single event. On the other hand, a systems approach to vulnerability analysis would associate the avalanche with the pile’s increasing slope and to the connectedness (or population density) of the grains constituting the pile since these factors render the structure increasingly vulnerable to falling grains.

Left on its own, the sand pile’s stability, or the social network, becomes increasingly critical or vulnerable. From this perspective, “all disasters are slow onset when realistically and locally related to conditions of susceptibility”. A hazard event might be rapid-onset, but the disaster, requiring much more than a hazard, is a long-term process, not a one-off event.

We must therefore “reduce as much as we can the force of the underlying tectonic stresses in order to lower the risk of synchronous failure—that is, of catastrophic collapse that cascades across boundaries between technological, social and ecological systems” (7).

Recall Rousseau’s comment on population density as a contributing cause of the earthquake disaster and Popper’s remark that adequate shelter or resilience could offset the impact of typhoons. The sand pile at the bottom of the hourglass is constrained by the glass’s circumference. While abstract, this image mimics the growth of densely populated cities that become increasingly vulnerable to hazards, either natural or technological.

Unlike the clock’s lifeless grains of sand, however, human beings can minimize their vulnerability to exogenous shocks through disaster preparedness, mitigation and adaptation. In doing so, individuals can “flatten” the structure of the sand pile into a less hierarchical system and thereby shift or diffuse the risk of an avalanche. In conflict prevention terms, this means structural prevention, which typically focuses on local livelihoods and local capacity building.

Implications

Clearly, early warning should seek to monitor both the falling grains and the vulnerability of the sand pile to determine the risk and magnitude of an avalanche. In more formalistic language, a dual approach is important because it is not always clear a priori whether a disaster is due to a strong exogenous shock, to the internal dynamics of the system or a combination of both (8).

As the disaster management community has learned, in “support[ing] good decision-making, the issue is not one of being able to predict the unpredictable. Rather, the fundamental question is that, given that we cannot have reliable predictions of future outcomes, how can we prevent excessive hazard levels today and in the future in a cost-effective manner?”

More on resilience:

  • Disaster Response, Self-Organization and Resilience [Link]
  • On Technology and Building Resilient Societies to Mitigate the Impact of Disasters [Link]
  • Social Media = Social Capital = Disaster Resilience? [Link]
  • Failing Gracefully in Complex Systems: A Note on Resilience [Link]
  • Towards a Match.com for Economic Resilience [Link]

Moving Forward with Swift River

This is an update on the latest Swift River open group meeting that took place this morning at the InSTEDD office in Palo Alto. Ushahidi colleague Kaushal Jhalla first proposed the idea behind Swift River after the terrorist attacks on Mumbai last November. Ushahidi has since taken on the initiative as a core project since the goal of Swift River is central to the group’s mission: the crowdsourcing of crisis information.

Kaushal and Chris Blow gave the first formal presentation of Swift River during our first Ushahidi strategy meeting in Orlando last March where we formally established the Swift River group, which includes Andrew Turner, Sean Gourely, Erik Hersman and myself in addition to Kaushal and Chris. Andrew has played a pivotal role in getting Swift River and Vote Report India off the ground and I highly recommend reading his blog post on the initiative.

The group now includes several new friends of Ushahidi, a number of whom kindly shared their time and insights this morning after Chris kicked off the meeting to bring everyone up to speed.  The purpose of this blog post is to outline how I hope Swift River moves forward based on this morning’s fruitful session. Please see my previous blog post for an overview of the basic methodology.

The purpose of the Swift River platform, as I proposed this morning, is to provide two core services. The first, to borrow Guarva Mishra‘s description, is to crowdsource the tagging of crisis information. The second is to triangulate the tagged information to assign reality scores to individual events. Confused? Not to worry, it’s actually really straightforward.

Crowdsourcing Tagging

Information on a developing crisis can be captured from several text-based sources such articles from online news media, Tweets and SMS, for example. Of course, video footage, pictures and satellite imagery can also provide important information, but we’re more interested in text-based data for now.

The first point to note is that information can range from being very structured to highly unstructured. The word structure is simply another way of describing how organized information is. A few examples are in order vis-a-vis text-based information.

A book is generally highly structured information. Why? Well, because the author hopefully used page numbers, chapter headings, paragraphs, punctuation, an index and table of contents. The fact that the book is structured makes it easier for the reader to find the information she is looking for. The other end of the “structure spectrum” would be a run-on sentence with nospacesandpunctuation. Not terribly helpful.

Below is a slide from a seminar I taught on disaster and conflict early warning back in 2006; ignore the (c).

ewstructure

The slide above depicts the tradeoff between control and structure. We can impose structure on data collected if we control the data entry process. Surveys are an example of a high-control process that yields high-structure. We want high structure because this allows us to find and analyze the data more easily (c.f. entropy). This has generally been the preferred approach, particularly amongst academics.

If we give up control, as one does when crowdsourcing crisis information, we open ourselves up to the possibility of having to deal with a range of structured and unstructured information. To make sense of this information typically requires data mining and natural language processing (NLP) techniques that can identify structure in said information. For example, we would want to identify nouns, verbs, places and dates in order to extra event-data.

One way to do this would be to automatically tag an article with the parameters “who, what, where and when.” A number of platforms such as Open Calais and Virtual Research Associate’s FORECITE already do this. However, these platforms are not customized for crowdsourcing of crisis information and most are entirely closed. (Note: I did consulting work for VRA many years ago).

So we need to draw (and modify) relevant algorithms that are publically available and provide and a user-friendly interface for human oversight of the automated tagging (what we also referred to as crowdsourcing the filter). Here’s a proposed interface that Chris recently designed for Swift River.

swiftriver

The idea would be to develop an algorithm that parses the text (on the left) and auto-suggests answers for the tags (on the right). The user would then confirm or correct the suggested tags and the algorithm would learn from it’s mistakes. In other words, the algorithm would become more accurate over time and the need for human oversight would decrease. In short, we’d be developing a data-driven ontology backed up by Freebase to provide semantic linkages.

VRA already does this but, (1) the data validation is carried out by one (poor) individual, (2) the articles were restricted to the headlines from Reuters and Agence France Press (AFP) newswires, and (3) the project did not draw on semantic analysis. The validation component entailed making sure that events described in the headlines were correctly coded by the parser and ensuring there were no duplicates. See VRA’s patent for the full methodology (PDF).

Triangulation and Scoring

The above tagging process would yield a highly structured event dataset like the example depicted below.

dataset

We could then use simple machine analysis to cluster the same events together and thereby do away with any duplicate event-data. The four records above would then be collapsed into one record:

datafilter2

But that’s not all. We would use a simple weighting or scoring schema to assign a reality score to determine the probability that the event reported really happened. I already described this schema in my previous post so will just give one example: An event that is reported by more than one source is more likely to have happened. This increases the reality score of the event above and pushes it higher up the list. One could also score an event by the geographical proximity of the source to the reported event, and so on. These scores could be combined to give an overall score.

Compelling Visualization

The database output above is not exactly compelling to most people. This is where we need some creative visualization techniques to render the information more intuitive and interesting. Here are a few thoughts. We could draw on Gapminder to visualize the triangulated event-data over time. We could also use the idea of a volume equalizer display.

equalize

This is not the best equalizer interface around for sure, but hopefully gets the point across. Instead of decibels on the Y-axis, we’d have probability scores that an event really happened. Instead of frequencies on the X-axis, we’d have the individual events. Since the data coming in is not static, the bars would bounce up and down as more articles/tweets get tagged and dumped into the event database.

I think this would be an elegant way to visualize the data, not least because the animation would resemble the flow or waves of a swift river but the idea of using a volume equalizer could be used as analogy to quiet the unwanted noise. For the actual Swift River interface, I’d prefer using more colors to denote different characteristics about the event and would provide the user with the option of double-clicking on a bar to drill down to the event sources and underlying text.

Patrick Philippe Meier

Mobile Crisis Mapping (MCM)

I first blogged about Mobile Crisis Mapping (MCM) back in October 2008 and several times since. The purpose of this post to put together the big picture. What do I mean by MCM? Why is it important? And how would I like to see MCM evolve?

Classical MCM

When I coined the term Mobile Crisis Mapping last October, I wrote that MCM was the next logical step in the field of crisis mapping. One month later, at the first Crisis Mappers Meeting, I emphasized the need to think of maps as communication tools and once again referred to MCM. In my posts on the Crisis Mapping Conference Proposal and A Brief History of Crisis Mapping, I referred to MCM but only in passing.

More recently, I noted the MCM component of the UN’s Threat and Risk Mapping (TRMA) project in the Sudan and referred to two projects presented at the ICTD2009 conference in Doha—one on quality of data collected using mobile phones and the second on a community-based mapping iniative called Folksomaps.

So what is Mobile Crisis Mapping? The most obvious answer is that MCM is the collection of georeferenced crisis information using peer to peer (P2P) mobile technology. Related to MCM are the challenges of data validation, communication security and so on.

Extending MCM

But there’s more. P2P communication is bi-directional, e.g., two-way SMS broadcasting. This means that MCM is also about the ability of the end-user in the field being to query a crisis map using an SMS and/or voice-based interface. Therein lies the combined value of MCM: collection and query.

The Folksomaps case study comes closest to what I have in mind. The project uses binary operators to categorize relationships between objects mapped to render queries possible. For instance, ‘is towards left of’ could be characterized as <Libya, Egypt>.

The methodology draws on the Web Ontology Language (OWL) to model the categorical characteristics of an object (e.g., direction, proximity, etc), and thence infer new relationships not explicitly specified by users of the system. In other words, Folksomaps provides an ontology of locations.

Once this ontology is created, the map can actually be queried at a distance. That’s what I consider to be the truly innovative and unique aspect of MCM. The potential added value is huge, and James BonTempo describes exactly how huge MCM could be in his superb presentation on extending FrontlineSMS.

An initiative related to Folksomaps and very much in line with my thinking about MCM is Cartagen. This project uses string-based geocoding (e.g. “map Bhagalpur, India”) to allow users in the field to produce and search their own maps by using the most basic of mobile phones. “This widens participation to 4 billion cell phone users worldwide, as well as to rural regions outside the reach of the internet. Geographic mapping with text messages has applications in disaster response and health care.”

MCM Scenario

The query functionality is thus key to Mobile Crisis Mapping. One should be able to “mobile-query” a crisis map by SMS or voice.

If I’m interfacing with an Ushahidi deployment in the Sudan, I should be able to send an SMS to find out where, relative to my location, an IDP camp is located; or where the closest airfield is, etc. Query results can be texted back to the mobile phone and the user can forward that result to others. I should also be able to call up a designated number and walk through a simple Interactive Voice Response (IVR) interface to get the same answer.

Once these basic search queries are made available, more complex, nested queries can be developed—again, see James BonTempo’s presentation to get a sense of the tremendous potential of MCM.

The reason I see MCM as the next logical step in the field of crisis mapping is because more individuals have access to mobile phones in humanitarian crises than a computer connected to the Web. In short, the point of Mobile Crisis Mapping is to bring Crisis Mapping Analytics (CMA) to the mobile phone.

Patrick Philippe Meier

JRC: Geo-Spatial Analysis for Global Security

The European Commission’s Joint Research Center (JRC) is doing some phenomenal work on Geo-Spatial Information Analysis for Global Security and Stability. I’ve had several meetings with JRC colleagues over the years and have always been very impressed with their projects.

The group is not very well known outside Europe so the purpose of this blog post is to highlight some of the Center’s projects.

  • Enumeration of Refugee Camps: The project developed an operational methodology to estimate refugee populations using very high resolution (VHR) satellite imagery. “The methodology relies on a combination of machine-assisted procedures, photo-interpretation and statistical sampling.”

jrc1

  • Benchmarking Hand Held Equipment for Field Data Collection: This project tested new devices for the collection for geo-referenced information. “The assessment of the instruments considered their technical characteristics, like the availability of necessary instruments or functionalities, technical features, hardware specifics, software compatibility and interfaces.”

jrc3

  • GEOCREW – Study on Geodata and Crisis Early Warning: This project analyzed the use of geo-spatial technology in the decision-making process of institutions dealing with international crises. The project also aimed to show best practice in the use of geo-spatial technologies in the decision-making process.
  • Support to Peacekeeping Operations in the Sudan: Maps are generally not available or often are out of date for most of the conflict areas in which peacekeping personnel is deployed,  This UNDPKO Darfur mapping initiative aimed to create an alliance of partners that addressed this gap and shared the results.

jrc4

  • Temporary Settlement Analysis by Remote Sensing: The project analyzes different types of refugee and IDP settlements to identify single structures inside refugee settlements. “The objective of the project is to establish the first comprehensive catalog of image interpretation keys, based on last-generation satellite data and related to the analysis of transitional settlements.”

JRC colleagues often publish papers on their work and I highly recommend having a look at this book when it comes out in June 2009:

jrc5

Patrick Philippe Meier

Video Introduction to Crisis Mapping

I’ve given many presentations on crisis mapping over the past two years but these were never filmed. So I decided to create this video presentation with narration in order to share my findings more widely and hopefully get a lot of feedback in the process. The presentation is not meant to be exhaustive although the video does run to about 30 minutes.

The topics covered in this presentation include:

  • Crisis Map Sourcing – information collection;
  • Mobile Crisis Mapping – mobile technology;
  • Crisis Mapping Visualization – data visualization;
  • Crisis Mapping Analysis – spatial analysis.

The presentation references several blog posts of mine in addition to several operational projects to illustrate the main concepts behind crisis mapping. The individual blog posts featured in the presentation are listed below:

This research is the product of a 2-year grant provided by Humanity United  (HU) to the Harvard Humanitarian Initiative’s (HHI) Program on Crisis Mapping and Early Warning, where I am a doctoral fellow.

I look forward to any questions/suggestions you may have on the video primer!

Patrick Philippe Meier

Folksomaps: Gold Standard for Community Mapping

There were a number of mapping-related papers, posters and demo’s at ICTD2009. One paper in particular caught my intention given the topic’s direct relevance to my ongoing consulting work with the UN’s Threat and Risk Mapping Analysis (TRMA) project in the Sudan and the upcoming ecosystem project in Liberia with Ushahidi and Humanity United.

Introduction

Entitled “Folksomaps – Towards Community Intelligent Maps for Developing Regions,” the paper outlines a community-driven approach for creating maps by drawing on “Web 2.0 principles” and “Semantic Web technologies” but without having to rely entirely on a web-based interface. Indeed, Folksomaps “makes use of web and voice applications to provide access to its services.”

I particularly value the authors’ aim to “provide map-based services that represent user’s intuitive way of finding locations and directions in developing regions.” This is an approach that definitely resonates with me. Indeed, it is our responsibility to adapt and customize our community-based mapping tools to meet the needs, habits and symbology of the end user; not the other way around.

I highly recommend this paper (or summary below) to anyone doing work in the crisis mapping field. In fact, I consider it required reading. The paper is co-authored by Arun Kumar, Dipanjan Chakraborty, Himanshu Chauhan, Sheetal Agarwal and Nitendra Rajput of IBM India Research Lab in New Delhi.

Background

Vast rural areas of developing countries do not have detailed maps or mapping tools. Rural populations are generally semi-literate, low-income and non-tech savvy. They are hardly like to have access to neogeography platforms like Google Earth. Moreover, the lack of electricity access and Internet connection also complicates the situation.

We also know that cities, towns and villages in developing countries “typically do not have well structured naming of streets, roads and houses,” which means “key landmarks become very important in specifying locations and directions.”

Drawing on these insights, the authors seek to tap the collective efforts of local communities to populate, maintain and access content for their own benefit—an approach I have described as crowdfeeding.

Surveys of Tech and Non-Tech Users

The study is centered on end-user needs, which is rather refreshing. The authors carried out a series of surveys to be better understand the profiles of end-users, e.g., tech and non-tech users.

The first survey sought to identify answers to the following questions:

  • How do people find out points of interest?
  • How do much people rely on maps versus people on the streets?
  • How do people provide local information to other people?
  • Whether people are interested in consuming and feeding information for a community-driven map system?

The results are listed in the table below:

folksotb1

Non-tech savvy users did not use maps to find information about locations and only 36% of these users required precise information. In addition, 75% of non-tech respondents preferred the choice of a phone-based interface, which really drives home the need for what I have coined “Mobile Crisis Mapping” or MCM.

Tech-users also rely primarily on others (as opposed to maps) for location related information. The authors associate this result with the lack of signboards in countries like India. “Many a times, the maps do not contain fine-grained information in the first place.”

Most tech-users responded that a phone-based location and direction finding system in addition to a web-based interface. Almost 80% expressed interest in “contributing to the service by uploading content either over the phone or though a web-based portal.”

The second survey sought to identify how tech and non-tech users express directions and local information. For example:

  • How do you give directions to people on the road or to friends?
  • How do you describe proximity of a landmark to another one?
  • How do you describe distance? Kilometers or using time-to-travel?

The results are listed in the table below:

folksotb2

The majority of non-tech savvy participants said they make use of landmarks when giving directions. “They use names of big roads […] and use ‘near to’, ‘adjacent to’, ‘opposite to’ relations with respect to visible and popular landmarks […].” Almost 40% of responders said they use time only to describe the distance between any two locations.

Tech-savvy participants almost always use both time and kilometers as a measure to represent distance. Only 10% or so of participants used kilometers only to represent distance.

The Technology

The following characteristics highlight the design choices that differentiate Folksomaps from established notions of map systems:

  • Relies on user generated content rather than data populated by professionals;
  • Strives for spatial integrity in the logical sense and does not consider spatial integrity in the physical sense as essential (which is a defining feature of social maps);
  • Does not consider visual representation as essential, which is important considering the fact that a large segment of users in developing countries do not have access to Internet (hence my own emphasis on mobile crisis mapping);
  • Is non-static and intelligent in the sense that it infers new information from what is entered by the users.
  • User input is not verified by the system and it is possible that pieces of incorrect information in the knowledgebase may be present at different points of time. Folksomaps adopts the Wiki model and allows all users to add, edit and remove content freely while keeping maps up-to-date.

Conceptual Design

Folksomaps uses “landmark” as the basic unit in the mapping knowledgebase model while “location” represents more coarse-grained geographical areas such as a village, city or country. The model then seeks to capture a few key logical characteristics of locations such as direction, distance, proximity and reachability and layer.

The latter constitutes the granularity of the geographic area that a location represents. “The notion of direction and distance from a location is interpreted with respect to the layer that the location represents. In other words, direction and distance could be viewed as binary operator over locations of the same level. For instance, ‘is towards left of ’ would be appropriate if the location pair being considered is <Libya, Egypt>,” but not if the pair is <Nairobi, India>.

The knowledgebase makes use of two modules, the Web Ontology Language (OWL) and a graph database, to represent and store the above concepts. The Semantic Web language OWL is used to model the categorical characteristics of a landmark (e.g., direction, proximity, etc), and thence infer new relationships not explicitly specified by users of the system. In other words, OWL provides an ontology of locations.

The graph database is used represent distance (numerical relationships) between landmarks. “The locations are represented by nodes and the edges between two nodes of the graph are labeled with the distance between the corresponding locations.” Given the insights gained from user surveys, precise distances and directions are not integral components of community-based maps.

The two modules are used to generate answers to queries submitted by users.

User Interaction

The authors rightly recognize that the user interface design is critical to the success of community-based mapping projects. To be sure, users of may be illiterate, or semi-illiterate and not very tech-savvy. Furthermore, users will tend to query the map system when they need it most, e.g., “when they are stuck on the road looking for directions […] and would be pressed for time.” This very much holds true for crisis mapping as well.

Users can perform three main tasks with the system: “find place”, “trace path” and “add info.” In addition, some or all users may be granted the right to edit or remove entries from the knowledgebase. The Folksomaps system can also be bootstrapped from existing databases to populate instances of location types. “Two such sources of data in the absence of a full-fledged Geographical Information System (GIS) come from the Telecom Industry and the Postal Department.”

folksofig3

How the users interface with the system to carry out these tasks will depend on how tech-savvy or literate they are and what type of access they have to information and communication technologies.

Folksomaps thus provides three types of interface: web-based, voice-based and SMS-based. Each interface allows the user to query and update the database. The web-based interface was developed using Java Server Pages (JSP) while the voice-based interface uses JSPs and VoiceXML.

folksofig41

I am particularly interested in the voice-based interface. The authors point to previous studies that suggest a voice-based interaction works well with users who are illiterate or semi-illiterate and who cannot afford to have high-end devices but can use ordinary low-end phones.

folksofig1

I will share this with the Ushahidi development team with the hopes that they will consider adding a voice-based interface for the platform later this year. To be sure, could be very interesting to integrate Freedom Fone’s work in this area.

Insights from User Studies

The authors conducted user studies to verify the benefit and acceptability of Folksomaps. Tech-savvy used the web-based interface while non-tech savvy participants used the voice-based interface. The results are shown in the two tables below.

folksotb3

Several important insights surfaced from the results of the user studies. For example, an important insight gained from the non-tech user feedback was “the sense of security that they would get with such a system. […] Even though asking for travel directions from strangers on the street is an option, it exposes the enquirer to criminal elements […].”

Another insight gain was the fact that many non-tech savvy participants were willing to pay for the call even a small premium over normal charges as they saw value to having this information available to them at all times.” That said, the majority of participants “preferred the advertisement model where an advertisement played in the beginning of the call pays for the entire call.”

Interestingly, almost all participants preferred the voice-based interface over SMS even though the former led to a number of speech recognition errors. The reason being that “many people are either not comfortable using SMS or not comfortable using a mobile phone itself.”

There were also interesting insights on the issue of accuracy from the perspective of non-tech savvy participants. Most participants asked for full accuracy and only a handful were tolerant of minor mistakes. “In fact, one of the main reasons for preferring a voice call over asking people for directions was to avoid wrong directions.”

This need for high accuracy is driven by the fact that most people use public transportation, walk or use a bicycle to reach their destination, which means the cost of incorrect information is large compared to someone who owns a car.

This is an important insight since the authors had first assumed that tolerance for incorrect information was higher. They also learned that meta information is as important to non-tech savvy users as the landmarks themselves. For instance, low-income participants were more interested in knowing the modes of available transportation, timetables and bus route numbers than the road route from a source to a destination.

folkstb4

In terms of insights from tech-savvy participants, they did not ask for fine-grained directions all the time. “They were fight with getting high level directions involving major landmarks.” In addition, the need for accuracy was not as strong as for the non-tech savvy respondents and they preferred the content from the queries sent to them via SMS so they could store it for future access, “pointing out that it is easy to forget the directions if you just hear it.”

Some tech-savvy participants also suggested that the directions provided by Folksomaps should “take into consideration the amount of knowledge the subject already has about the area, i.e., it should be personalized based upon user profile. Other participants mentioned that “frequent changes in road plans due to constructions should be captured by such a system—thus making it more usable than just getting directions.”

Conclusion

In sum, the user interface of Folksomaps needs to be “rich and adaptive to the information needs of the user […].” To be sure, given user preference towards “voice-based interface over SMS, designing an efficient user-friendly voice-based user interface […].” In addition, “dynamic and real-time information augmented with traditional services like finding directions and locations would certainly add value to Folksomaps.” Furthermore, the authors recognize that Folksomaps can “certainly benefit from user interface designs,” and “multi-model front ends.”

Finally, the user surveys suggest “the community is very receptive towards the concept of a community-driven map,” so it is important that the TRMA project in the Sudan and the ecosystem Liberia project build on the insights and lessons learned provided in this study.

Patrick Philippe Meier

Improving Quality of Data Collected by Mobile Phones

The ICTD2009 conference in Doha, Qatar, had some excellent tech demo’s. I had the opportunity to interview Kuang Chen, a PhD student with UC Berkeley’s computer science department about his work on improving data quality using dynamic forms and machine learning.

I’m particularly interested in this area of research since ensuring data quality continues to be a real challenge in the fields of conflict early warning and crisis mapping. So I always look for alternative and creative approaches that address this challenge. I include below the abstract for Kuang’s project (which includes 5 other team members) and a short 2-minute interview.

Abstract

“Organizations in developing regions want to efficiently collect digital data, but standard data gathering practices from the developed world are often inappropriate. Traditional techniques for form design and data quality are expensive and labour-intensive. We propose a new data-driven approach to form design, execution (filling) and quality assurance. We demonstrate USHER, an end-to-end system that automatically generates data entry forms that enforce and maintain data quality constraints during execution. The system features a probabilistic engine that drives form-user interactions to encourage correct answers.”

In my previous post on data quality evaluation, I pointed to a study that suggests mobile-based data entry has significantly higher error rates. The study shows that a voice call to a human operator results in superior data quality—no doubt due to the human operator double-checking the respondent’s input verbally.  USHER’s ability to dynamically adjust the user interface (form layout and data entry widgets) is one approach to provide some context-specific data-driven user feedback that is currently lacking in mobile forms, as an automated proxy of a human data entry person on the other end of the line.

Interview

This is my first video so many thanks to Erik Hersman for his tips on video editing! And many thanks to Kuang for the interview.

Patrick Philippe Meier