Tag Archives: Learning

Using Sound and Artificial Intelligence to Detect Human Rights Violations

Video continues to be a powerful way to capture human rights abuses around the world. Videos posted to social media can be used to hold perpetrators of gross violations accountable. But video footage poses a “Big Data” challenge to human rights organizations. Two billion smartphone users means almost as many video cameras. This leads to massive amounts of visual content of both suffering and wrong-doing during conflict zones. Reviewing these videos manually is a very labor intensive, time consuming, expensive and often traumatic task. So my colleague Jay Aronson at CMU has been exploring how artificial intelligence and in particular machine learning might solve this challenge.

blog1pic

As Jay and team rightly note in a recent publication (PDF), “the dissemination of conflict and human rights related video has vastly outpaced the ability of researchers to keep up with it – particularly when immediate political action or rapid humanitarian response is required.” The consequences of this are similar to what I’ve observed in humanitarian aid: At some point (which will vary from organization to organization), time and resource limitations will necessitate an end to the collection, archiving, and analysis of user generated content unless the process can be automated.” In sum, information overload can “prevent human rights researchers from uncovering widely dispersed events taking place over long periods of time or large geographic areas that amount to systematic human rights violations.”

jayblog1

To take on this Big Data challenge, Jay and team have developed a new machine learning-based audio processing system that “enables both synchronization of multiple audio-rich videos of the same event, and discovery of specific sounds (such as wind, screaming, gunshots, airplane noise, music, and explosions) at the frame level within a video.” The system basically “creates a unique “soundprint” for each video in a collection, synchronizes videos that are recorded at the same time and location based on the pattern of these signatures, and also enables these signatures to be used to locate specific sounds precisely within a video. The use of this tool for synchronization ultimately provides a multi-perspectival view of a specific event, enabling more efficient event reconstruction and analysis by investigators.”

Synchronizing image features is far more complex than synchronizing sound. “When an object is occluded, poorly illuminated, or not visually distinct from the background, it cannot always be detected by computer vision systems. Further, while computer vision can provide investigators with confirmation that a particular video was shot from a particular location based on the similarity of the background physical environment, it is less adept at synchronizing multiple videos over time because it cannot recognize that a video might be capturing the same event from different angles or distances. In both cases, audio sensors function better so long as the relevant videos include reasonably good audio.”

Ukrainian human rights practitioners working with families of protestors killed during the 2013-2014 Euromaidan Protests recently approached Jay and company to analyze videos from those events. They wanted to “ locate every video available in their collection of the moments before, during, and just after a specific set of killings. They wanted to extract information from these videos, including visual depictions of these killings, whether the protesters in question were an immediate and direct threat to the security forces, plus any other information that could be used to corroborate or refute other forms of evidence or testimony available for their cases.”

jayblog2

Their plan had originally been to manually synchronize more than 65 hours of video footage from 520 videos taken during the morning of February 20, 2014. But after working full-time over several months, they were only able to stitch together about 4 hours of the total video using visual and audio cues in the recording.” So Jay and team used their system to make sense of the footage. They were able to automatically synchronize over 4 hours of the footage. The figure above shows an example of video clips synchronized by the system.

Users can also “select a segment within the video containing the event they are interested in (for example, a series of explosions in a plaza), and search in other videos for a similar segment that shows similar looking buildings or persons, or that contains a similar sounding noise. A user may for example select a shooting scene with a significant series of gunshots, and may search for segments with a similar sounding series of gunshots. This method increases the chances for finding video scenes of an event displaying different angles of the scene or parallel events.”

Jay and team are quick to emphasize that their system “does not  eliminate human involvement in the process because machine learning systems provide probabilistic, not certain, results.” To be sure, “the synchronization of several videos is noisy and will likely include mistakes—this is precisely why human involvement in the process is crucial.”

I’ve been following Jay’s applied research for many years now and continue to be a fan of his approach given the overlap with my own work in the use of machine learning to make sense of the Big Data generated during major natural disasters. I wholeheartedly agree with Jay when he reflected during a recent call that the use of advanced techniques alone is not the answer. Effective cross-disciplinary collaboration between computer scientists and human rights (or humanitarian) practitioners is really hard but absolutely essential. This explains why I wrote this practical handbook on how to create effective collaboration and successful projects between computer scientists and humanitarian organizations.

Artificial Intelligence Powered by Crowdsourcing: The Future of Big Data and Humanitarian Action

There’s no point spewing stunning statistics like this recent one from The Economist, which states that 80% of adults will have access to smartphones before 2020. The volume, velocity and variety of digital data will continue to skyrocket. To paraphrase Douglas Adams, “Big Data is big. You just won’t believe how vastly, hugely, mind-bogglingly big it is.”

WP1

And so, traditional humanitarian organizations have a choice when it comes to battling Big Data. They can either continue business as usual (and lose) or get with the program and adopt Big Data solutions like everyone else. The same goes for Digital Humanitarians. As noted in my new book of the same title, those Digital Humanitarians who cling to crowdsourcing alone as their pièce de résistance will inevitably become the ivy-laden battlefield monuments of 2020.

bookcover

Big Data comprises a variety of data types such as text, imagery and video. Examples of text-based data includes mainstream news articles, tweets and WhatsApp messages. Imagery includes Instagram, professional photographs that accompany news articles, satellite imagery and increasingly aerial imagery as well (captured by UAVs). Television channels, Meerkat and YouTube broadcast videos. Finding relevant, credible and actionable pieces of text, imagery and video in the Big Data generated during major disasters is like looking for a needle in a meadow (haystacks are ridiculously small datasets by comparison).

Humanitarian organizations, like many others in different sectors, often find comfort in the notion that their problems are unique. Thankfully, this is rarely true. Not only is the Big Data challenge not unique to the humanitarian space, real solutions to the data deluge have already been developed by groups that humanitarian professionals at worst don’t know exist and at best rarely speak with. These groups are already using Artificial Intelligence (AI) and some form of human input to make sense of Big Data.

Data digital flow

How does it work? And why do you still need some human input if AI is already in play? The human input, which can be via crowdsourcing or a few individuals is needed to train the AI engine, which uses a technique from AI called machine learning to learn from the human(s). Take AIDR, for example. This experimental solution, which stands for Artificial Intelligence for Disaster Response, uses AI powered by crowdsourcing to automatically identify relevant tweets and text messages in an exploding meadow of digital data. The crowd tags tweets and messages they find relevant and the AI engine learns to recognize the relevance patterns in real-time, allowing AIDR to automatically identify future tweets and messages.

As far as we know, AIDR is the only Big Data solution out there that combines crowdsourcing with real-time machine learning for disaster response. Why do we use crowdsourcing to train the AI engine? Because speed is of the essence in disasters. You need a crowd of Digital Humanitarians to quickly tag as many tweets/messages as possible so that AIDR can learn as fast as possible. Incidentally, once you’ve created an algorithm that accurately detects tweets relaying urgent needs after a Typhoon in the Philippines, you can use that same algorithm again when the next Typhoon hits (no crowd needed).

What about pictures? After all, pictures are worth a thousand words. Is it possible to combine artificial intelligence with human input to automatically identify pictures that show infrastructure damage? Thanks to recent break-throughs in computer vision, this is indeed possible. Take Metamind, for example, a new startup I just met with in Silicon Valley. Metamind is barely 6 months old but the team has already demonstrated that one can indeed automatically identify a whole host of features in pictures by using artificial intelligence and some initial human input. The key is human input since this is what trains the algorithms. The more human-generated training data you have, the better your algorithms.

My team and I at QCRI are collaborating with Metamind to create algorithms that can automatically detect infrastructure damage in pictures. The Silicon Valley start-up is convinced that we’ll be able to create a highly accurate algorithms if we have enough training data. This is where MicroMappers comes in. We’re already using MicroMappers to create training data for tweets and text messages (which is what AIDR uses to create algorithms). In addition, we’re already using MicroMappers to tag and map pictures of disaster damage. The missing link—in order to turn this tagged data into algorithms—is Metamind. I’m excited about the prospects, so stay tuned for updates as we plan to start teaching Metamind’s AI engine this month.

Screen Shot 2015-03-16 at 11.45.31 AM

How about videos as a source of Big Data during disasters? I was just in Austin for SXSW 2015 and met up with the CEO of WireWax, a British company that uses—you guessed it—artificial intelligence and human input to automatically detect countless features in videos. Their platform has already been used to automatically find guns and Justin Bieber across millions of videos. Several other groups are also working on feature detection in videos. Colleagues at Carnegie Melon University (CMU), for example, are working on developing algorithms that can detect evidence of gross human rights violations in YouTube videos coming from Syria. They’re currently applying their algorithms on videos of disaster footage, which we recently shared with them, to determine whether infrastructure damage can be automatically detected.

What about satellite & aerial imagery? Well the team driving DigitalGlobe’s Tomnod platform have already been using AI powered by crowdsourcing to automatically identify features of interest in satellite (and now aerial) imagery. My team and I are working on similar solutions with MicroMappers, with the hope of creating real-time machine learning solutions for both satellite and aerial imagery. Unlike Tomnod, the MicroMappers platform is free and open source (and also filters social media, photographs, videos & mainstream news).

Screen Shot 2015-03-16 at 11.43.23 AM

Screen Shot 2015-03-16 at 11.41.21 AM

So there you have it. The future of humanitarian information systems will not be an App Store but an “Alg Store”, i.e, an Algorithm Store providing a growing menu of algorithms that have already been trained to automatically detect certain features in texts, imagery and videos that gets generated during disasters. These algorithms will also “talk to each other” and integrate other feeds (from real-time sensors, Internet of Things) thanks to data-fusion solutions that already exist and others that are in the works.

Now, the astute reader may have noted that I omitted audio/speech in my post. I’ll be writing about this in a future post since this one is already long enough.

What Humanitarians Can Learn from Conservation UAVs

I recently joined my fellow National Geographic Emergency Explorer colleague Shah Selbe on his first expedition of SoarOcean, which seeks to leverage low-cost UAVs for Ocean protection. Why did I participate in an expedition that seemingly had nothing to do with humanitarian response? Because the conservation space is well ahead of the humanitarian sector when it comes to using UAVs. To this end, we have a lot to learn from colleagues like Shah and others outside our field. The video below explains this further & provides a great overview of SoarOcean.

And here’s my short amateur aerial video from the expedition:

My goal, by the end of the year, is to join two more expeditions led by members of the Humanitarian UAV Network Advisory Board. Hopefully one of these will be with Drone Adventures (especially now that I’ve been invited to volunteer as “Drone Adventures Ambassador”, possibly the coolest title I will ever have). I’m also hoping to join my colleague Steve from the ShadowView Foundation in one of his team’s future expeditions. His Foundation has extensive experience in the use of UAVs for anti-poaching and wildlife conservation.

In sum, I learned heaps during Shah’s SoarOcean expedition; there’s just no substitute for hands-on learning and onsite tinkering. So I really hope I can join Drone Adventures and ShadowView later this year. In the meantime, big thanks to Shah and his awesome team for a great weekend of flying and learning.

bio

See Also:

  • Welcome to the Humanitarian UAV Network [link]
  • How UAVs are Making a Difference in Disaster Response [link]
  • Humanitarians Using UAVs for Post Disaster Recovery [link]
  • Grassroots UAVs for Disaster Response [link]
  • Using UAVs for Search & Rescue [link]
  • Debrief: UAV/Drone Search & Rescue Challenge [link]
  • Crowdsourcing Analysis of UAV Imagery for Search/Rescue [link]
  • Check-List for Flying UAVs in Humanitarian Settings [link]

Using MicroMappers to Make Sense of UAV Imagery During Disasters

Aerial imagery will soon become a Big Data problem for humanitarian response—particularly oblique imagery. This was confirmed to me by a number of imagery experts in both the US (FEMA) and Europe (JRC). Aerial imagery taken at an angle is referred to as oblique imagery; compared to vertical imagery, which is taken by cameras pointing straight down (like satellite imagery). The team from Humanitarian OpenStreetMap (HOT) is already well equipped to make sense of vertical aerial imagery. They do this by microtasking the tracing of said imagery, as depicted below. So how do we rapidly analyze oblique images, which often provide more detail vis-a-vis infrastructure damage than vertical pictures?

HOTosm PH

One approach is to microtask the tagging of oblique images. This was carried out very successfully after Hurricane Sandy (screenshot below).

This solution did not include any tracing and was not designed to inform the development of machine learning classifiers to automatically identify features of interest, like damaged buildings, for example. Making sense of Big (Aerial) Data will ultimately require the combined use of human computing (microtasking) and machine learning. As volunteers use microtasking to trace features of interest such as damaged buildings pictured in oblique aerial imagery, perhaps machine learning algorithms can learn to detect these features automatically if enough examples of damaged buildings are provided. There is obviously value in doing automated feature detection with vertical imagery as well. So my team and I at QCRI have been collaborating with a local Filipino UAV start-up (SkyEye) to develop a new “Clicker” for our MicroMappers collection. We’ll be testing the “Aerial Clicker” below with our Filipino partners this summer. Incidentally, SkyEye is on the Advisory Board of the Humanitarian UAV Network (UAViators).

Aerial Clicker

Aerial Clicker 2

SkyEye is interested in developing a machine learning classifier to automatically identify coconut trees, for example. Why? Because coconut trees are an important source of livelihood for many Filipinos. Being able to rapidly identify trees that are still standing versus uprooted would enable SkyEye to quickly assess the impact of a Typhoon on local agriculture, which is important for food security & long-term recovery. So we hope to use the Aerial Clicker to microtask the tracing of coconut trees in order to significantly improve the accuracy of the machine learning classifier that SkyEye has already developed.

Will this be successful? One way to find out is by experimenting. I realize that developing a “visual version” of AIDR is anything but trivial. While AIDR was developed to automatically identify tweets (i.e., text) of interest during disasters by using microtasking and machine learning, what if we also had a free and open source platform to microtask and then automatically identify visual features of interest in both vertical and oblique imagery captured by UAVs? To be honest, I’m not sure how feasible this is vis-a-vis oblique imagery. As an imagery analyst at FEMA recently told me, this is still a research question for now. So I’m hoping to take this research on at QCRI but I do not want to duplicate any existing efforts in this space. So I would be grateful for feedback on this idea and any related research that iRevolution readers may recommend.

In the meantime, here’s another idea I’m toying with for the Aerial Clicker:

Aerial Clicker 3

I often see this in the aftermath of major disasters; affected communities turning to “analog social medial” to communicate when cell phone towers are down. The aerial imagery above was taken following Typhoon Yolanda in the Philippines. And this is just one of several dozen images with analog media messages that I came across. So what if our Aerial Clicker were to ask digital volunteers to transcribe or categorize these messages? This would enable us to quickly create a crisis map of needs based on said content since every image is already geo-referenced. Thoughts?

bio

See Also:

  • Welcome to the Humanitarian UAV Network [link]
  • How UAVs are Making a Difference in Disaster Response [link]
  • Humanitarians Using UAVs for Post Disaster Recovery [link]
  • Grassroots UAVs for Disaster Response [link]
  • Using UAVs for Search & Rescue [link]
  • Debrief: UAV/Drone Search & Rescue Challenge [link]
  • Crowdsourcing Analysis of UAV Imagery for Search/Rescue [link]
  • Check-List for Flying UAVs in Humanitarian Settings [link]

Enhanced Messaging for the Emergency Response Sector (EMERSE)

My colleague Andrea Tapia and her team at PennState University have developed an interesting iPhone application designed to support humanitarian response. This application is part of their EMERSE project: Enhanced Messaging for the Emergency Response Sector. The other components of EMERSE include a Twitter crawler, automatic classification and machine learning.

The rationale for this important, applied research? “Social media used around crises involves self-organizing behavior that can produce accurate results, often in advance of official communications. This allows affected population to send tweets or text messages, and hence, make them heard. The ability to classify tweets and text messages automatically, together with the ability to deliver the relevant information to the appropriate personnel are essential for enabling the personnel to timely and efficiently work to address the most urgent needs, and to understand the emergency situation better” (Caragea et al., 2011).

The iPhone application developed by PennState is designed to help humanitarian professionals collect information during a crisis. “In case of no service or Internet access, the application rolls over to local storage until access is available. However, the GPS still works via satellite and is able to geo-locate data being recorded.” The Twitter crawler component captures tweets referring to specific keywords “within a seven-day period as well as tweets that have been posted by specific users. Each API call returns at most 1000 tweets and auxiliary metadata […].” The machine translation component uses Google Language API.

The more challenging aspect of EMERSE, however, is the automatic classification component. So the team made use of the Ushahidi Haiti data, which includes some 3,500 reports about half of which came from text messages. Each of these reports were tagged according to a specific (but not mutually exclusive category), e.g., Medical Emergency, Collapsed Structure, Shelter Needed, etc. The team at PennState experimented with various techniques from (NLP) and Machine Learning (ML) to automatically classify the Ushahidi Haiti data according to these pre-existing categories. The results demonstrate that “Feature Extraction” significantly outperforms other methods while Support Vector Machine (SVM) classifiers vary significantly depending on the category being coded. I wonder whether their approach is more or less effective than this one developed by the University of Colorado at Boulder.

In any event, PennState’s applied research was presented at the ISCRAM 2011 conference and the findings are written up in this paper (PDF): “Classifying Text Messages for the Haiti Earthquake.” The co-authors: Cornelia Caragea, Nathan McNeese, Anuj Jaiswal, Greg Traylor, Hyun-Woo Kim, Prasenjit Mitra, Dinghao Wu, Andrea H. Tapia, Lee Giles, Bernard J. Jansen, John Yen.

In conclusion, the team at PennState argue that the EMERSE system offers four important benefits not provided by Ushahidi.

“First, EMERSE will automatically classify tweets and text messages into topic, whereas Ushahidi collects reports with broad category information provided by the reporter. Second, EMERSE will also automatically geo-locate tweets and text messages, whereas Ushahidi relies on the reporter to provide the geo-location information. Third, in EMERSE, tweets and text messages are aggregated by topic and region to better understand how the needs of Haiti differ by regions and how they change over time. The automatic aggregation also helps to verify reports. A large number of similar reports by different people are more likely to be true. Finally, EMERSE will provide tweet broadcast and GeoRSS subscription by topics or region, whereas Ushahidi only allows reports to be downloaded.”

In terms of future research, the team may explore other types of abstraction based on semantically related words, and may also “design an emergency response ontology […].” So I recently got in touch with Andrea to get an update on this since their ISCRAM paper was published 14 months ago. I’ll be sure to share any update if this information can be made public.

Crisis Tweets: Natural Language Processing to the Rescue?

My colleagues at the University of Colorado, Boulder, have been doing some very interesting applied research on automatically extracting “situational awareness” from tweets generated during crises. As is increasingly recognized by many in the humanitarian space, Twitter can at times be an important source of relevant information. The challenge is to make sense of a potentially massive number of crisis tweets in near real-time to turn this information into situational awareness.

Using Natural Language Processing (NLP) and Machine Learning (ML), Colorado colleagues have developed a “suite of classifiers to differentiate tweets across several dimensions: subjectivity, personal or impersonal style, and linguistic register (formal or informal style).” They suggest that tweets contributing to situational awareness are likely to be “written in a style that is objective, impersonal, and formal; therefore, the identification of subjectivity, personal style and formal register could provide useful features for extracting tweets that contain tactical information.” To explore this hypothesis, they studied the follow four crisis events: the North American Red River floods of 2009 and 2010, the 2009 Oklahoma grassfires, and the 2010 Haiti earthquake.

The findings of this study were presented at the Association for the Advancement of Artificial Intelligence. The team from Colorado demonstrated that their system, which automatically classifies Tweets that contribute to situational awareness, works particularly well when analyzing “low-level linguistic features,” i.e., word-frequencies and key-word search. Their analysis also showed that “linguistically-motivated features including subjectivity, personal/impersonal style, and register substantially improve system performance.” In sum, “these results suggest that identifying key features of user behavior can aid in predicting whether an individual tweet will contain tactical information. In demonstrating a link between situational awareness and other markable characteristics of Twitter communication, we not only enrich our classification model, we also enhance our perspective of the space of information disseminated during mass emergency.”

The paper, entitled: “Natural Language Processing to the Rescue? Extracting ‘Situational Awareness’ Tweets During Mass Emergency,” details the findings above and is available here. The study was authored by Sudha Verma, Sarah Vieweg, William J. Corvey, Leysia Palen, James H. Martin, Martha Palmer, Aaron Schram and Kenneth M. Anderson.