Tag Archives: AI

Using Sound and Artificial Intelligence to Detect Human Rights Violations

Video continues to be a powerful way to capture human rights abuses around the world. Videos posted to social media can be used to hold perpetrators of gross violations accountable. But video footage poses a “Big Data” challenge to human rights organizations. Two billion smartphone users means almost as many video cameras. This leads to massive amounts of visual content of both suffering and wrong-doing during conflict zones. Reviewing these videos manually is a very labor intensive, time consuming, expensive and often traumatic task. So my colleague Jay Aronson at CMU has been exploring how artificial intelligence and in particular machine learning might solve this challenge.


As Jay and team rightly note in a recent publication (PDF), “the dissemination of conflict and human rights related video has vastly outpaced the ability of researchers to keep up with it – particularly when immediate political action or rapid humanitarian response is required.” The consequences of this are similar to what I’ve observed in humanitarian aid: At some point (which will vary from organization to organization), time and resource limitations will necessitate an end to the collection, archiving, and analysis of user generated content unless the process can be automated.” In sum, information overload can “prevent human rights researchers from uncovering widely dispersed events taking place over long periods of time or large geographic areas that amount to systematic human rights violations.”


To take on this Big Data challenge, Jay and team have developed a new machine learning-based audio processing system that “enables both synchronization of multiple audio-rich videos of the same event, and discovery of specific sounds (such as wind, screaming, gunshots, airplane noise, music, and explosions) at the frame level within a video.” The system basically “creates a unique “soundprint” for each video in a collection, synchronizes videos that are recorded at the same time and location based on the pattern of these signatures, and also enables these signatures to be used to locate specific sounds precisely within a video. The use of this tool for synchronization ultimately provides a multi-perspectival view of a specific event, enabling more efficient event reconstruction and analysis by investigators.”

Synchronizing image features is far more complex than synchronizing sound. “When an object is occluded, poorly illuminated, or not visually distinct from the background, it cannot always be detected by computer vision systems. Further, while computer vision can provide investigators with confirmation that a particular video was shot from a particular location based on the similarity of the background physical environment, it is less adept at synchronizing multiple videos over time because it cannot recognize that a video might be capturing the same event from different angles or distances. In both cases, audio sensors function better so long as the relevant videos include reasonably good audio.”

Ukrainian human rights practitioners working with families of protestors killed during the 2013-2014 Euromaidan Protests recently approached Jay and company to analyze videos from those events. They wanted to “ locate every video available in their collection of the moments before, during, and just after a specific set of killings. They wanted to extract information from these videos, including visual depictions of these killings, whether the protesters in question were an immediate and direct threat to the security forces, plus any other information that could be used to corroborate or refute other forms of evidence or testimony available for their cases.”


Their plan had originally been to manually synchronize more than 65 hours of video footage from 520 videos taken during the morning of February 20, 2014. But after working full-time over several months, they were only able to stitch together about 4 hours of the total video using visual and audio cues in the recording.” So Jay and team used their system to make sense of the footage. They were able to automatically synchronize over 4 hours of the footage. The figure above shows an example of video clips synchronized by the system.

Users can also “select a segment within the video containing the event they are interested in (for example, a series of explosions in a plaza), and search in other videos for a similar segment that shows similar looking buildings or persons, or that contains a similar sounding noise. A user may for example select a shooting scene with a significant series of gunshots, and may search for segments with a similar sounding series of gunshots. This method increases the chances for finding video scenes of an event displaying different angles of the scene or parallel events.”

Jay and team are quick to emphasize that their system “does not  eliminate human involvement in the process because machine learning systems provide probabilistic, not certain, results.” To be sure, “the synchronization of several videos is noisy and will likely include mistakes—this is precisely why human involvement in the process is crucial.”

I’ve been following Jay’s applied research for many years now and continue to be a fan of his approach given the overlap with my own work in the use of machine learning to make sense of the Big Data generated during major natural disasters. I wholeheartedly agree with Jay when he reflected during a recent call that the use of advanced techniques alone is not the answer. Effective cross-disciplinary collaboration between computer scientists and human rights (or humanitarian) practitioners is really hard but absolutely essential. This explains why I wrote this practical handbook on how to create effective collaboration and successful projects between computer scientists and humanitarian organizations.

Crowdsourcing Point Clouds for Disaster Response

Point Clouds, or 3D models derived from high resolution aerial imagery, are in fact nothing new. Several software platforms already exist to reconstruct a series of 2D aerial images into fully fledged 3D-fly-through models. Check out these very neat examples from my colleagues at Pix4D and SenseFly:

What does a castle, Jesus and a mountain have to do with humanitarian action? As noted in my previous blog post, there’s only so much disaster damage one can glean from nadir (that is, vertical) imagery and oblique imagery. Lets suppose that the nadir image below was taken by an orbiting satellite or flying UAV right after an earthquake, for example. How can you possibly assess disaster damage from this one picture alone? Even if you had nadir imagery for these houses before the earthquake, your ability to assess structural damage would be limited.

Screen Shot 2015-04-09 at 5.48.23 AM

This explains why we also captured oblique imagery for the World Bank’s UAV response to Cyclone Pam in Vanuatu (more here on that humanitarian mission). But even with oblique photographs, you’re stuck with one fixed perspective. Who knows what these houses below look like from the other side; your UAV may have simply captured this side only. And even if you had pictures for all possible angles, you’d literally have 100’s of pictures to leaf through and make sense of.

Screen Shot 2015-04-09 at 5.54.34 AM

What’s that famous quote by Henry Ford again? “If I had asked people what they wanted, they would have said faster horses.” We don’t need faster UAVs, we simply need to turn what we already have into Point Clouds, which I’m indeed hoping to do with the aerial imagery from Vanuatu, by the way. The Point Cloud below was made only from single 2D aerial images.

It isn’t perfect, but we don’t need perfection in disaster response, we need good enough. So when we as humanitarian UAV teams go into the next post-disaster deployment and ask what humanitarians they need, they may say “faster horses” because they’re not (yet) familiar with what’s really possible with the imagery processing solutions available today. That obviously doesn’t mean that we should ignore their information needs. It simply means we should seek to expand their imaginations vis-a-vis the art of the possible with UAVs and aerial imagery. Here is a 3D model of a village in Vanuatu constructed using 2D aerial imagery:

Now, the title of my blog post does lead with the word crowdsourcing. Why? For several reasons. First, it takes some decent computing power (and time) to create these Point Clouds. But if the underlying 2D imagery is made available to hundreds of Digital Humanitarians, we could use this distributed computing power to rapidly crowdsource the creation of 3D models. Second, each model can then be pushed to MicroMappers for crowdsourced analysis. Why? Because having a dozen eyes scrutinizing one Point Cloud is better than 2. Note that for quality control purposes, each Point Cloud would be shown to 5 different Digital Humanitarian volunteers; we already do this with MicroMappers for tweets, pictures, videos, satellite images and of course aerial images as well. Each digital volunteer would then trace areas in the Point Cloud where they spot damage. If the traces from the different volunteers match, then bingo, there’s likely damage at those x, y and z coordinate. Here’s the idea:

We could easily use iPads to turn the process into a Virtual Reality experience for digital volunteers. In other words, you’d be able to move around and above the actual Point Cloud by simply changing the position of your iPad accordingly. This technology already exists and has for several years now. Tracing features in the 3D models that appear to be damaged would be as simple as using your finger to outline the damage on your iPad.

What about the inevitable challenge of Big Data? What if thousands of Point Clouds are generated during a disaster? Sure, we could try to scale our crowd-sourcing efforts by recruiting more Digital Humanitarian volunteers, but wouldn’t that just be asking for a “faster horse”? Just like we’ve already done with MicroMappers for tweets and text messages, we would seek to combine crowdsourcing and Artificial Intelligence to automatically detect features of interest in 3D models. This sounds to me like an excellent research project for a research institute engaged in advanced computing R&D.

I would love to see the results of this applied research integrated directly within MicroMappers. This would allow us to integrate the results of social media analysis via MicroMappers (e.g, tweets, Instagram pictures, YouTube videos) directly with the results of satellite imagery analysis as well as 2D and 3D aerial imagery analysis generated via MicroMappers.

Anyone interested in working on this?

Artificial Intelligence Powered by Crowdsourcing: The Future of Big Data and Humanitarian Action

There’s no point spewing stunning statistics like this recent one from The Economist, which states that 80% of adults will have access to smartphones before 2020. The volume, velocity and variety of digital data will continue to skyrocket. To paraphrase Douglas Adams, “Big Data is big. You just won’t believe how vastly, hugely, mind-bogglingly big it is.”


And so, traditional humanitarian organizations have a choice when it comes to battling Big Data. They can either continue business as usual (and lose) or get with the program and adopt Big Data solutions like everyone else. The same goes for Digital Humanitarians. As noted in my new book of the same title, those Digital Humanitarians who cling to crowdsourcing alone as their pièce de résistance will inevitably become the ivy-laden battlefield monuments of 2020.


Big Data comprises a variety of data types such as text, imagery and video. Examples of text-based data includes mainstream news articles, tweets and WhatsApp messages. Imagery includes Instagram, professional photographs that accompany news articles, satellite imagery and increasingly aerial imagery as well (captured by UAVs). Television channels, Meerkat and YouTube broadcast videos. Finding relevant, credible and actionable pieces of text, imagery and video in the Big Data generated during major disasters is like looking for a needle in a meadow (haystacks are ridiculously small datasets by comparison).

Humanitarian organizations, like many others in different sectors, often find comfort in the notion that their problems are unique. Thankfully, this is rarely true. Not only is the Big Data challenge not unique to the humanitarian space, real solutions to the data deluge have already been developed by groups that humanitarian professionals at worst don’t know exist and at best rarely speak with. These groups are already using Artificial Intelligence (AI) and some form of human input to make sense of Big Data.

Data digital flow

How does it work? And why do you still need some human input if AI is already in play? The human input, which can be via crowdsourcing or a few individuals is needed to train the AI engine, which uses a technique from AI called machine learning to learn from the human(s). Take AIDR, for example. This experimental solution, which stands for Artificial Intelligence for Disaster Response, uses AI powered by crowdsourcing to automatically identify relevant tweets and text messages in an exploding meadow of digital data. The crowd tags tweets and messages they find relevant and the AI engine learns to recognize the relevance patterns in real-time, allowing AIDR to automatically identify future tweets and messages.

As far as we know, AIDR is the only Big Data solution out there that combines crowdsourcing with real-time machine learning for disaster response. Why do we use crowdsourcing to train the AI engine? Because speed is of the essence in disasters. You need a crowd of Digital Humanitarians to quickly tag as many tweets/messages as possible so that AIDR can learn as fast as possible. Incidentally, once you’ve created an algorithm that accurately detects tweets relaying urgent needs after a Typhoon in the Philippines, you can use that same algorithm again when the next Typhoon hits (no crowd needed).

What about pictures? After all, pictures are worth a thousand words. Is it possible to combine artificial intelligence with human input to automatically identify pictures that show infrastructure damage? Thanks to recent break-throughs in computer vision, this is indeed possible. Take Metamind, for example, a new startup I just met with in Silicon Valley. Metamind is barely 6 months old but the team has already demonstrated that one can indeed automatically identify a whole host of features in pictures by using artificial intelligence and some initial human input. The key is human input since this is what trains the algorithms. The more human-generated training data you have, the better your algorithms.

My team and I at QCRI are collaborating with Metamind to create algorithms that can automatically detect infrastructure damage in pictures. The Silicon Valley start-up is convinced that we’ll be able to create a highly accurate algorithms if we have enough training data. This is where MicroMappers comes in. We’re already using MicroMappers to create training data for tweets and text messages (which is what AIDR uses to create algorithms). In addition, we’re already using MicroMappers to tag and map pictures of disaster damage. The missing link—in order to turn this tagged data into algorithms—is Metamind. I’m excited about the prospects, so stay tuned for updates as we plan to start teaching Metamind’s AI engine this month.

Screen Shot 2015-03-16 at 11.45.31 AM

How about videos as a source of Big Data during disasters? I was just in Austin for SXSW 2015 and met up with the CEO of WireWax, a British company that uses—you guessed it—artificial intelligence and human input to automatically detect countless features in videos. Their platform has already been used to automatically find guns and Justin Bieber across millions of videos. Several other groups are also working on feature detection in videos. Colleagues at Carnegie Melon University (CMU), for example, are working on developing algorithms that can detect evidence of gross human rights violations in YouTube videos coming from Syria. They’re currently applying their algorithms on videos of disaster footage, which we recently shared with them, to determine whether infrastructure damage can be automatically detected.

What about satellite & aerial imagery? Well the team driving DigitalGlobe’s Tomnod platform have already been using AI powered by crowdsourcing to automatically identify features of interest in satellite (and now aerial) imagery. My team and I are working on similar solutions with MicroMappers, with the hope of creating real-time machine learning solutions for both satellite and aerial imagery. Unlike Tomnod, the MicroMappers platform is free and open source (and also filters social media, photographs, videos & mainstream news).

Screen Shot 2015-03-16 at 11.43.23 AM

Screen Shot 2015-03-16 at 11.41.21 AM

So there you have it. The future of humanitarian information systems will not be an App Store but an “Alg Store”, i.e, an Algorithm Store providing a growing menu of algorithms that have already been trained to automatically detect certain features in texts, imagery and videos that gets generated during disasters. These algorithms will also “talk to each other” and integrate other feeds (from real-time sensors, Internet of Things) thanks to data-fusion solutions that already exist and others that are in the works.

Now, the astute reader may have noted that I omitted audio/speech in my post. I’ll be writing about this in a future post since this one is already long enough.

Analyzing Tweets on Malaysia Flight #MH370

My QCRI colleague Dr. Imran is using our AIDR platform (Artificial Intelligence for Disaster Response) to collect & analyze tweets related to Malaysia Flight 370 that went missing several days ago. He has collected well over 850,000 English-language tweets since March 11th; using the following keywords/hashtags: Malaysia Airlines flight, #MH370m #PrayForMH370 and #MalaysiaAirlines.

MH370 Prayers

Imran then used AIDR to create a number of “machine learning classifiers” to automatically classify all incoming tweets into categories that he is interested in:

  • Informative: tweets that relay breaking news, useful info, etc

  • Praying: tweets that are related to prayers and faith

  • Personal: tweets that express personal opinions

The process is super simple. All he does is tag several dozen incoming tweets into their respective categories. This teaches AIDR what an “Informative” tweet should “look like”. Since our novel approach combines human intelligence with artificial intelligence, AIDR is typically far more accurate at capturing relevant tweets than Twitter’s keyword search.

And the more tweets that Imran tags, the more accurate AIDR gets. At present, AIDR can auto-classify ~500 tweets per second, or 30,000 tweets per minute. This is well above the highest velocity of crisis tweets recorded thus far—16,000 tweets/minute during Hurricane Sandy.

The graph below depicts the number of tweets generated since the day we started collecting the AIDR collection, i.e., March 11th.

Volume of Tweets per Day

This series of pie charts simply reflects the relative share of tweets per category over the past four days.

Tweets Trends

Below are some of the tweets that AIDR has automatically classified as being Informative (click to enlarge). The “Confidence” score simply reflects how confident AIDR is that it has correctly auto-classified a tweet. Note that Imran could also have crowdsourced the manual tagging—that is, he could have crowdsourced the process of teaching AIDR. To learn more about how AIDR works, please see this short overview and this research paper (PDF).

AIDR output

If you’re interested in testing AIDR (still very much under development) and/or would like the Tweet ID’s for the 850,000+ tweets we’ve collected using AIDR, then feel free to contact me. In the meantime, we’ll start a classifier that auto-collects tweets related to hijacking, criminal causes, and so on. If you’d like us to create a classifier for a different topic, let us know—but we can’t make any promises since we’re working on an important project deadline. When we’re further along with the development of AIDR, anyone will be able to easily collect & download tweets and create & share their own classifiers for events related to humanitarian issues.


Acknowledgements: Many thanks to Imran for collecting and classifying the tweets. Imran also shared the graphs and tabular output that appears above.

Syria: Crowdsourcing Satellite Imagery Analysis to Identify Mass Human Rights Violations

Update: See this blog post for the latest. Also, our project was just featured on the UK Guardian Blog!

What if we crowdsourced satellite imagery analysis of key cities in Syria to identify evidence of mass human rights violations? This is precisely the question that my colleagues at Amnesty International USA’s Science for Human Rights Program asked me following this pilot project I coordinated for Somalia. AI-USA has done similar work in the past with their Eyes on Darfur project, which I blogged about here in 2008. But using micro-tasking with backend triangulation to crowdsource the analysis of high resolution satellite imagery for human rights purposes is definitely breaking new ground.

A staggering amount of new satellite imagery is produced every day; millions of square kilometers’ worth according to one knowledgeable colleague. This is a big data problem that needs mass human intervention until the software can catch up. I recently spoke with Professor Ryan Engstrom, the Director of the Spatial Analysis Lab at George Washington University, and he confirmed that automated algorithms for satellite imagery analysis still have a long, long way to go. So the answer for now has to be human-driven analysis.

But professional satellite imagery experts who have plenty of time to volunteer their skills are far and few between. The Satellite Sentinel Project (SSP), which I blogged about here, is composed of a very small team and a few interns. Their focus is limited to the Sudan and they are understandably very busy. My colleagues at AI-USA analyze satellite imagery for several conflicts, but this takes them far longer than they’d like and their small team is still constrained given the number of conflicts and vast amounts of imagery that could be analyzed. This explains why they’re interested in crowdsourcing.

Indeed, crowdsourcing imagery analysis has proven to be a workable solution in several other projects & sectors. The “crowd” can indeed scan and tag vast volumes of satellite imagery data when that imagery is “sliced and diced” for micro-tasking. This is what we did for the Somalia pilot project thanks to the Tomnod platform and the imagery provided by Digital Globe. The yellow triangles below denote the “sliced images” that individual volunteers from the Standby Task Force (SBTF) analyzed and tagged one at a time.

We plan do the same with high resolution satellite imagery of three key cities in Syria selected by the AI-USA team. The specific features we will look for and tag include: “Burnt and/or darkened building features,” “Roofs absent,” “Blocks on access roads,” “Military equipment in residential areas,” “Equipment/persons on top of buildings indicating potential sniper positions,” “Shelters composed of different materials than surrounding structures,” etc. SBTF volunteers will be provided with examples of what these features look like from a bird’s eye view and from ground level.

Like the Somalia project, only when a feature—say a missing roof—is tagged identically  by at least 3 volunteers will that location be sent to the AI-USA team for review. In addition, if volunteers are unsure about a particular feature they’re looking at, they’ll take a screenshot of said feature and share it on a dedicated Google Doc for the AI-USA team and other satellite imagery experts from the SBTF team to review. This feedback mechanism is key to ensure accurate tagging and inter-coder reliability. In addition, the screenshots shared will be used to build a larger library of features, i.e., what a missing roof looks like as well military equipment in residential areas, road blocks, etc. Volunteers will also be in touch with the AI-USA team via a dedicated Skype chat.

There will no doubt be a learning curve, but the sooner we climb that learning curve the better. Democratizing satellite imagery analysis is no easy task and one or two individuals have opined that what we’re trying to do can’t be done. That may be, but we won’t know unless we try. This is how innovation happens. We can hypothesize and talk all we want, but concrete results are what ultimately matters. And results are what can help us climb that learning curve. My hope, of course, is that democratizing satellite imagery analysis enables AI-USA to strengthen their advocacy campaigns and makes it harder for perpetrators to commit mass human rights violations.

SBTF volunteers will be carrying out the pilot project this month in collaboration with AI-USA, Tomnod and Digital Globe. How and when the results are shared publicly will be up to the AI-USA team as this will depend on what exactly is found. In the meantime, a big thanks to Digital Globe, Tomnod and SBTF volunteers for supporting the AI-USA team on this initiative.

If you’re interested in reading more about satellite imagery analysis, the following blog posts may also be of interest:

• Geo-Spatial Technologies for Human Rights
• Tracking Genocide by Remote Sensing
• Human Rights 2.0: Eyes on Darfur
• GIS Technology for Genocide Prevention
• Geo-Spatial Analysis for Global Security
• US Calls for UN Aerial Surveillance to Detect Preparations for Attacks
• Will Using ‘Live’ Satellite Imagery to Prevent War in the Sudan Actually Work?
• Satellite Imagery Analysis of Kenya’s Election Violence: Crisis Mapping by Fire
• Crisis Mapping Uganda: Combining Narratives and GIS to Study Genocide
• Crowdsourcing Satellite Imagery Analysis for Somalia: Results of Trial Run
• Genghis Khan, Borneo & Galaxies: Crowdsourcing Satellite Imagery Analysis
• OpenStreetMap’s New Micro-Tasking Platform for Satellite Imagery Tracing

Eyes on Darfur: 2 Villages Missing from Site

An update on Amnesty International’s (AI) “Eyes on Darfur” project based on my previous blog.

At least two of the protected villages monitored by AI using very-high resolution imagery provided by AAAS have been removed from the site after reported attacks in the area, with updated imagery still being processed. The attacks in question were summarized by this UNHCR Report.

This raises some important questions as noted by a colleague in a recent discussion: the bigger issue here is vital, all this geo-mapping is virtual, and while it may impact the real world that’s not a foregone conclusion; Would other NGOs, or perhaps a consortium, do better at the protective concept? And how? Namely, who can protect these villages and others like them?

I will write another blog this week on precisely these questions, i.e., civilian protection.

Patrick Philippe Meier

Human Rights 2.0: Eyes on Darfur

Amnesty International (AI) is taking human rights monitoring to a whole new level, metaphorically and literally speaking. The organization’s “Eyes on Darfur” project leverages the power of high-resolution satellite imagery to provide unimpeachable evidence of the atrocities being committed in Darfur – enabling action by private citizens, policy makers and international courts. Eyes On Darfur also breaks new ground in protecting human rights by allowing people around the world to literally “watch over” and protect twelve intact, but highly vulnerable, villages using commercially available satellite imagery.

I met with AI today to learn more. The human rights organization sends government officials these images on a regular basis to remind them that the world is watching. The impact? The villages monitored by AI have not been attacked while neighboring ones have. According to AI, there have also been notable changes in decisions made by the Bashir government since “Eyes on Darfur” went live a year ago. Equally interesting is that AI has been able to track the movement of the Janjaweed thanks to commercially available satellite imagery. In addition, the government of Chad cited the AI project as one of the reasons they accepted UN peacekeepers.

The American Association for the Advancement of Science (AAAS) is also leading a Human Rights and Geospatial Technologies project. So I also sat with them to learn more (September 2007). NGOs in Burma provided AAAS with information concerning attacks on civilians carried out by government forces in late 2006 and early 2007. AAAS staff reviewed these reports and compared them with high-resolution satellite images to identify destruction of housing and infrastructure and construction of new military occupation camps. The result is available in these Google Earth Layers. AAAS has provided comparable layers for Sudan, Chad, Lebanon and Zimbabwe. And this is just the tip of the iceberg.

AI is venturing on a 3-year project to provide satellite imagery to monitor forced displacement for early detection and advocacy. AAAS is developing a user-friendly web-based interface to let the NGO community know in real time where commercial satellites are positioned and what geographical areas they are taking pictures of. The interface includes direct links to the private companies operating these satellites along with contact and pricing information. AAAS believes this tool will enable the NGO community to make far more effective use of satellite imagery and to serve as a deterrent against repressive regimes choosing to commit mass atrocities.

The European Commission’s Joint Research Center (JRC) out of Ispra, Italy is also engaged in phenomenal work using satellite imagery. I first met with the JRC in 2004 and more recently in October 2007. The Center has developed automated models for change detection that are far more reliable than previously thought possible. Using pattern detection algorithms, the JRC can detect whether infrastructure has been destroyed, damaged, built or remained unchanged. They are now applying these models to monitor changes in refugee camps worldwide. The advantage of the JRC’s models is that they don’t necessarily require high resolution satellite imagery.

The same team at the JRC has also developed models to approximate population density in urban areas such as the Kibera slums out of Nairobi. Using satellite pictures taken at different angles, the team is able to construct 3D models of infrastructure such as individual buildings and houses. Thanks to these models they are able to approximate the size of these structures and thus estimate the number of inhabitants.

While AI and AAAS have been collaborating on some of these projects, the JRC has not been connected to this work. I therefore organized a working lunch during the OCHA +5 Symposium in Geneva last Fall to connect AAAS, the JRC, the Feinstein Center and the USHMM. My intention is to catalyze greater collaboration between these organizations and projects so we can upgrade to Human Rights 2.0.

Patrick Philippe Meier