Having blogged at length on the rationale for Crisis Mapping Analytics (CMA), I am now interested in assessing the applicability of existing tools for crisis mapping vis-a-vis complex humanitarian emergencies.
In this blog post, I review an open-source software package called GeoSurveillance that combines spatial statistical techniques and GIS routines to perform tests for the detection and monitoring of spatial clustering.
The post is based on the new peer-reviewed article “GeoSurveillance: a GIS-based system for the detection and monitoring of spatial clusters” published in the Journal of Geographical Systems and authored by Ikuho Yamada, Peter Rogerson and Gyoungju Lee.
The detection of spatial clusters—testing the null hypothesis of spatial randomness—is a key focus of spatial analysis. My first research project in this area dates back to 1996, when I wrote a software algorithm in C++ to determine the randomness (or non-randomness) of stellar distributions.
The program would read a graphics file of a high-quality black-and-white image of a stellar distribution (that I had scanned from a rather expensive book) and run a pattern analysis procedure to determine what constituted a star and then detect them. Note that the stars were of various sizes and resolutions, with many overlapping in part.
Once the stars were detected, I manually approximated the number of stars in the stellar distributions to evaluate the reliability of my algorithm. The program would then assign (x, y) coordinates to each star. I compared this series of numbers with a series of pseudo-random numbers that I generated independently.
Using the Kolmogorov-Smirnov test in two-dimensions, I could then test the probability that the series of (x, y) coordinates pseudo-random numbers were samples that came from the same set.
Retrospective vs Prospective Analysis
This type of spatial cluster analysis on stellar distributions is retrospective and the majority of methods developed to date belong to this class of tests.
The other class of spatial cluster detection is called prospective testing. This testing is designed for time-series data that is updated over time and test statistics are computed when new data becomes available. “While retrospective tests focus on a static aspect of spatial patterns, prospective tests take into account their dynamic nature and attempt to ﬁnd new, emergent clusters as quickly as possible.”
There has been a surge of interest in this prospective approach following the anthrax attacks of 2001 and the perceived threat of bioterrorism since. But as the authors of the GeoSurveillance study note, prospective monitoring approaches have broader application, “including the detection of outbreaks of food poisoning and infectious diseases and the detection of emergent crime hotspots.” And I would add crisis mapping for complex humanitarian emergencies.
Very little work has been done using retrospective analysis for crisis mapping and even less using prospective techniques. Both are equally important. The former is critical if we want to have a basis (and indeed baseline) to know what deviations and patterns to look for. The former is important since as humanitarian practitioners and policy makers, we are interested in operational conflict prevention.
Spatial Analysis Software
While several GIS software packages provide functionalities for retrospective analysis of spatial patterns, “few provide for prospective analysis,” with the notable exception of SaTScan, which enables both applications. SaTScan does has two drawbacks, however.
The first is that “prospective analysis in SaTScan is not adjusted in a statistically rigorous manner for repeated time-periodic tests conducted as new data become available.” Secondly, the platform “does not offer any GIS functionality for quick visual assessment of detected clusters.”
What is needed is a platform that provides a convenient graphical user-interface (GUI) that allows users to identify spatial clusters both statistically and visually. GeoSurveillance seeks to do just this.
This spatial analysis software consists of three components: a cluster detection and monitoring component, a GIS component and a support tool component as depicted below.
- “The cluster detection and monitoring component is further divided into retrospective and prospective analysis tools, each of which has a corresponding user-interface where parameters and options for the analysis are to be set. When the analysis is completed, the user-interfaces also provide a textual and/or graphical summary of results.”
- “The GIS component generates map representation of the results, where basic GIS functionalities such as zoom in/out, pan, and identify are available. For prospective analysis, the resulting map representation is updated every time a statistical computation for a time unit is completed so that spatial patterns changing over time can be visually assessed as animation.”
- “The support tool component provides various auxiliary tools for user.”
The table below presents a summary (albeit not exhaustive) of statistical tests for cluster detection. The methods labeled in bold are currently available within GeoSurveillance.
GeoSurveillance uses the local score statistic for retrospective analysis and applies the univariate cumulative sum (cusum) method. Cusum methods are familiar to public health professionals since they are often applied to public health monitoring.
Both methods are somewhat involved mathematically speaking so I won’t elaborate on them here. Suffice it to say that the complexity of spatial analysis techniques needs to be “hidden” from the average user if this kind of platform is to be used by humanitarian practitioners in the field.
The authors Yamada et. al used the platform to carry out a particularly interesting study of low birth weight (LBW) incidence data in Los Angeles, California.
Traditional studies “on LBW have focused on individual-level risk factors such as race/ethnicity, maternal age, maternal education, use of prenatal care, smoking and other substance abuse during pregnancy.” However, such individual factors have had little ability to explain the risk of LBW. To this end, “increasing attention has been directed to neighborhood-level risk factors including […] racial/ethnic composition, economic status, crime rate, and population growth trend.”
The authors of the GeoSurveillance study thus hypothesize that “the risk of LBW incidence and its change over time have non-random spatial patterns reﬂecting background distributions of neighborhood-level risk factors.” The results of the retrospective and prospective analysis using GeoSurveillance is available both in tabular and map formats. The latter format is displayed and interpreted below.
Using GeoSurveillance’s retrospective analysis functionality enable the authors to automatically detect high risk areas of LWB (marked in red) as well as the zone with the highest abnormal incidents of LBW (marked in yellow). The maps above indicate that a large concentration of neighborhoods with high risk of LBW are found “near downtown Los Angeles extending toward the northwest, and three smaller ones in the eastern part of the county.”
Carrying out prospective analysis on the LWB data enabled the authors to conclude that high the risk of LBW “used to be concentrated in particular parts of the county but is now more broadly spread throughout the county.” This result now provides the basis for further investigation to “identify individual- and neighborhood-level factors that relate to this change in the spatial distribution of the LBW risk.”
The developers of GeoSurveillance plan to implement more methods in the next version, especially for prospective analysis given the limited availability of such methods in other GIS software. The GeoSurveillance software as well as associated documentation and sample datasets can be downloaded here.