Category Archives: Information Forensics

Using Sound and Artificial Intelligence to Detect Human Rights Violations

Video continues to be a powerful way to capture human rights abuses around the world. Videos posted to social media can be used to hold perpetrators of gross violations accountable. But video footage poses a “Big Data” challenge to human rights organizations. Two billion smartphone users means almost as many video cameras. This leads to massive amounts of visual content of both suffering and wrong-doing during conflict zones. Reviewing these videos manually is a very labor intensive, time consuming, expensive and often traumatic task. So my colleague Jay Aronson at CMU has been exploring how artificial intelligence and in particular machine learning might solve this challenge.

blog1pic

As Jay and team rightly note in a recent publication (PDF), “the dissemination of conflict and human rights related video has vastly outpaced the ability of researchers to keep up with it – particularly when immediate political action or rapid humanitarian response is required.” The consequences of this are similar to what I’ve observed in humanitarian aid: At some point (which will vary from organization to organization), time and resource limitations will necessitate an end to the collection, archiving, and analysis of user generated content unless the process can be automated.” In sum, information overload can “prevent human rights researchers from uncovering widely dispersed events taking place over long periods of time or large geographic areas that amount to systematic human rights violations.”

jayblog1

To take on this Big Data challenge, Jay and team have developed a new machine learning-based audio processing system that “enables both synchronization of multiple audio-rich videos of the same event, and discovery of specific sounds (such as wind, screaming, gunshots, airplane noise, music, and explosions) at the frame level within a video.” The system basically “creates a unique “soundprint” for each video in a collection, synchronizes videos that are recorded at the same time and location based on the pattern of these signatures, and also enables these signatures to be used to locate specific sounds precisely within a video. The use of this tool for synchronization ultimately provides a multi-perspectival view of a specific event, enabling more efficient event reconstruction and analysis by investigators.”

Synchronizing image features is far more complex than synchronizing sound. “When an object is occluded, poorly illuminated, or not visually distinct from the background, it cannot always be detected by computer vision systems. Further, while computer vision can provide investigators with confirmation that a particular video was shot from a particular location based on the similarity of the background physical environment, it is less adept at synchronizing multiple videos over time because it cannot recognize that a video might be capturing the same event from different angles or distances. In both cases, audio sensors function better so long as the relevant videos include reasonably good audio.”

Ukrainian human rights practitioners working with families of protestors killed during the 2013-2014 Euromaidan Protests recently approached Jay and company to analyze videos from those events. They wanted to “ locate every video available in their collection of the moments before, during, and just after a specific set of killings. They wanted to extract information from these videos, including visual depictions of these killings, whether the protesters in question were an immediate and direct threat to the security forces, plus any other information that could be used to corroborate or refute other forms of evidence or testimony available for their cases.”

jayblog2

Their plan had originally been to manually synchronize more than 65 hours of video footage from 520 videos taken during the morning of February 20, 2014. But after working full-time over several months, they were only able to stitch together about 4 hours of the total video using visual and audio cues in the recording.” So Jay and team used their system to make sense of the footage. They were able to automatically synchronize over 4 hours of the footage. The figure above shows an example of video clips synchronized by the system.

Users can also “select a segment within the video containing the event they are interested in (for example, a series of explosions in a plaza), and search in other videos for a similar segment that shows similar looking buildings or persons, or that contains a similar sounding noise. A user may for example select a shooting scene with a significant series of gunshots, and may search for segments with a similar sounding series of gunshots. This method increases the chances for finding video scenes of an event displaying different angles of the scene or parallel events.”

Jay and team are quick to emphasize that their system “does not  eliminate human involvement in the process because machine learning systems provide probabilistic, not certain, results.” To be sure, “the synchronization of several videos is noisy and will likely include mistakes—this is precisely why human involvement in the process is crucial.”

I’ve been following Jay’s applied research for many years now and continue to be a fan of his approach given the overlap with my own work in the use of machine learning to make sense of the Big Data generated during major natural disasters. I wholeheartedly agree with Jay when he reflected during a recent call that the use of advanced techniques alone is not the answer. Effective cross-disciplinary collaboration between computer scientists and human rights (or humanitarian) practitioners is really hard but absolutely essential. This explains why I wrote this practical handbook on how to create effective collaboration and successful projects between computer scientists and humanitarian organizations.

QED – Goodbye Doha, Hello Adventure!

Quod Erat Demonstrandum (QED) is Latin for “that which had to be proven.” This abbreviation was traditionally used at the end of mathematical proofs to signal the completion of said proofs. I joined the Qatar Computing Research Institute (QCRI) well over 3 years ago with a very specific mission and mandate: to develop and deploy next generation humanitarian technologies. So I built the Institute’s Social Innovation Program from the ground up and recruited the majority of the full-time experts (scientists, engineers, research assistants, interns & project manager) who have become integral to the Program’s success. During these 3+years, my team and I partnered directly with humanitarian and development organizations to empirically prove that methods from advanced computing can be used to make sense of Big (Crisis) Data. The time has thus come to add “QED” to the end of that proof and move on to new adventures. But first a reflection.

Over the past 3.5 years, my team and I at QCRI developed free and open source solutions powered by crowdsourcing and artificial intelligence to make sense of Tweets, text messages, pictures, videos, satellite and aerial imagery for a wide range of humanitarian and development projects. We co-developed and co-deployed these platforms (AIDR and MicroMappers) with the United Nations and the World Bank in response to major disasters such as Typhoons Haiyan and RubyCyclone Pam and both the Nepal & Chile Earthquakes. In addition, we carried out peer-reviewed, scientific research on these deployments to better understand how to meet the information needs of our humanitarian partners. We also tackled the information reliability question, experimenting with crowd-sourcing (Verily) and machine learning (TweetCred) to assess the credibility of information generated during disasters. All of these initiatives were firsts in the humanitarian technology space.

We later developed AIDR-SMS to auto-classify text messages; a platform that UNICEF successfully tested in Zambia and which the World Food Program (WFP) and the International Federation of the Red Cross (IFRC) now plan to pilot. AIDR was also used to monitor a recent election, and our partners are now looking to use AIDR again for upcoming election monitoring efforts. In terms of MicroMappers, we extended the platform (considerably) in order to crowd-source the analysis of oblique aerial imagery captured via small UAVs, which was another first in the humanitarian space. We also teamed up with excellent research partners to crowdsource the analysis of aerial video footage and to develop automated feature-detection algorithms for oblique imagery analysis based on crowdsourced results derived from MicroMappers. We developed these Big Data solutions to support damage assessment efforts, food security projects and even this wildlife protection initiative.

In addition to the above accomplishments, we launched the Internet Response League (IRL) to explore the possibility of leveraging massive multiplayer online games to process Big Crisis Data. Along similar lines, we developed the first ever spam filter to make sense of Big Crisis Data. Furthermore, we got directly engaged in the field of robotics by launching the Humanitarian UAV Network (UAViators), yet another first in the humanitarian space. In the process, we created the largest repository of aerial imagery and videos of disaster damage, which is ripe for cutting-edge computer vision research. We also spearheaded the World Bank’s UAV response to Category 5 Cyclone Pam in Vanuatu and also directed a unique disaster recovery UAV mission in Nepal after the devastating earthquakes. (I took time off from QCRI to carry out both of these missions and also took holiday time to support UN relief efforts in the Philippines following Typhoon Haiyan in 2013). Lastly, on the robotics front, we championed the development of international guidelines to inform the safe, ethical & responsible use of this new technology in both humanitarian and development settings. To be sure, innovation is not just about the technology but also about crafting appropriate processes to leverage this technology. Hence also the rationale behind the Humanitarian UAV Experts Meetings that we’ve held at the United Nations Secretariat, the Rockefeller Foundation and MIT.

All  of the above pioneering-and-experimental projects have resulted in extensive media coverage, which has placed QCRI squarely on the radar of international humanitarian and development groups. This media coverage has included the New York Times, Washington Post, Wall Street Journal, CNN, BBC News, UK Guardian, The Economist, Forbes and Times Magazines, New Yorker, NPR, Wired, Mashable, TechCrunch, Fast Company, Nature, New Scientist, Scientific American and more. In addition, our good work and applied research has been featured in numerous international conference presentations and keynotes. In sum, I know of no other institute for advanced computing research that has contributed this much to the international humanitarian space in terms of thought-leadership, strategic partnerships, applied research and operational expertise through real-world co-deployments during and after major disasters.

There is, of course, a lot more to be done in the humanitarian technology space. But what we have accomplished over the past 3 years clearly demonstrates that techniques from advanced computing can indeed provide part of the solution to the pressing Big Data challenge that humanitarian & development organizations face. At the same time, as I wrote in the concluding chapter of my new book, Digital Humanitarians, solving the Big Data challenge does not alas imply that international aid organizations will actually make use of the resulting filtered data or any other data for that matter—even if they ask for this data in the first place. So until humanitarian organizations truly shift towards both strategic and tactical evidence-based analysis & data-driven decision-making, this disconnect will surely continue unabated for many more years to come.

Reflecting on the past 3.5 years at QCRI, it is crystal clear to me that the number one most important lesson I (re)learned is that you can do anything if you have an outstanding, super-smart and highly dedicated team that continually goes way above and beyond the call of duty. It is one thing for me to have had the vision for AIDR, MicroMappers, IRL, UAViators, etc., but vision alone does not amount to much. Implementing said vision is what delivers results and learning. And I simply couldn’t have asked for a more talented & stellar team to translate these visions into reality over the past 3+years. You each know who you are, partners included; it has truly been a privilege and honor working with you. I can’t wait to see what you do next at/with QCRI. Thank you for trusting me; thank you for sharing my vision; thanks for your sense of humor, and thank you for your dedication and loyalty to science and social innovation.

So what’s next for me? I’ll be lining up independent consulting work with several organizations (likely including QCRI). In short, I’ll be open for business. I’m also planning to work on a new project that I’m very excited about, so stay tuned for updates; I’ll be sure to blog about this new adventure when the time is right. For now, I’m busy wrapping up my work as Director of Social Innovation at QCRI and working with the best team there is. QED.

How to Become a Digital Sherlock Holmes and Support Relief Efforts

Humanitarian organizations need both timely and accurate information when responding to disasters. Where is the most damage located? Who needs the most help? What other threats exist? Respectable news organizations also need timely and accurate information during crisis events to responsibly inform the public. Alas, both humanitarian & mainstream news organizations are often confronted with countless rumors and unconfirmed reports. Investigative journalists and others have thus developed a number of clever strategies to rapidly verify such reports—as detailed in the excellent Verification Handbook. There’s just one glitch: Journalists and humanitarians alike are increasingly overwhelmed by the “Big Data” generated during crises, particularly information posted on social media. They rarely have enough time or enough staff to verify the majority of unconfirmed reports. This is where Verily comes in, a new type of Detective Agency for a new type of detective: The Virtual Digital Detective.

Screen Shot 2015-02-26 at 5.47.35 AM

The purpose of Verily is to rapidly crowdsource the verification of unconfirmed reports during major disasters. The way it works is simple. If a humanitarian or news organization has a verification request, they simply submit this request online at Verily. This request must be phrased in the form of a Yes-or-No question, such as: “Has the Brooklyn Bridge been destroyed by the Hurricane?”; “Is this Instagram picture really showing current flooding in Indonesia”?; “Is this new YouTube video of the Chile earthquake fake?”; “Is it true that the bush fires in South Australia are getting worse?” and so on.

Verily helps humanitarian & news organizations find answers to these questions by rapidly crowdsourcing the collection of clues that can help answer said questions. Verification questions are communicated widely across the world via Verily’s own email-list of Digital Detectives and also via social media. This new bread of Digital Detectives then scour the web for clues that can help answer the verification questions. Anyone can become a Digital Detective at Verily. Indeed, Verily provides a menu of mini-verification guides for new detectives. These guides were written by some of the best Digital Detectives on the planet, the authors of the Verification Handbook. Verily Detectives post the clues they find directly to Verily and briefly explain why these clues help answer the verification question. That’s all there is to it.

xlarge

If you’re familiar with Reddit, you may be thinking “Hold on, doesn’t Reddit do this already?” In part yes, but Reddit is not necessarily designed to crowdsource critical thinking or to create skilled Digital Detectives. Recall this fiasco during the Boston Marathon Bombings which fueled disastrous “witch hunts”. Said disaster would not have happened on Verily because Verily is deliberately designed to focus on the process of careful detective work while providing new detectives with the skills they need to precisely avoid the kind of disaster that happened on Reddit. This is no way a criticism of Reddit! One single platform alone cannot be designed to solve every problem under the sun. Deliberate, intentional design is absolutely key.

In sum, our goal at Verily is to crowdsource Sherlock Holmes. Why do we think this will work? For several reasons. First, authors of the Verification Handbook have already demonstrated that individuals working alone can, and do, verify unconfirmed reports during crises. We believe that creating a community that can work together to verify rumors will be even more powerful given the Big Data challenge. Second, each one of us with a mobile phone is a human sensor, a potential digital witness. We believe that Verily can help crowdsource the search for eyewitnesses, or rather the search for digital content that these eyewitnesses post on the Web. Third, the Red Balloon Challenge was completed in a matter of hours. This Challenge focused on crowdsourcing the search for clues across an entire continent (3 million square miles). Disasters, in contrast, are far more narrow in terms of geographic coverage. In other words, the proverbial haystack is smaller and thus the needles easier to find. More on Verily here & here.

So there’s reason to be optimistic that Verily can succeed given the above and recent real-world deployments. Of course, Verily is is still very much in early phase and still experimental. But both humanitarian organizations and high-profile news organizations have expressed a strong interest in field-testing this new Digital Detective Agency. To find out more about Verily and to engage with experts in verification, please join us on Tuesday, March 3rd at 10:00am (New York time) for this Google Hangout with the Verily Team and our colleague Craig Silverman, the Co-Editor of the Verification Handbook. Click here for the Event Page and here to follow on YouTube. You can also join the conversations on Twitter and pose questions or comments using the hashtag #VerilyLive.

Video: Digital Humanitarians & Next Generation Humanitarian Technology

How do international humanitarian organizations make sense of the “Big Data” generated during major disasters? They turn to Digital Humanitarians who craft and leverage ingenious crowdsourcing solutions with trail-blazing insights from artificial intelligence to make sense of vast volumes of social media, satellite imagery and even UAV/aerial imagery. They also use these “Big Data” solutions to verify user-generated content and counter rumors during disasters. The talk below explains how Digital Humanitarians do this and how their next generation humanitarian technologies work.

Many thanks to TTI/Vanguard for having invited me to speak. Lots more on Digital Humanitarians in my new book of the same title.

bookcover

Videos of my TEDx talks and the talks I’ve given at the White House, PopTech, Where 2.0, National Geographic, etc., are all available here.

Reflections on Digital Humanitarians – The Book

In January 2014, I wrote this blog post announcing my intention to write a book on Digital Humanitarians. Well, it’s done! And launches this week. The book has already been endorsed by scholars at Harvard, MIT, Stanford, Oxford, etc; by practitioners at the United Nations, World Bank, Red Cross, USAID, DfID, etc; and by others including Twitter and National Geographic. These and many more endorsements are available here. Brief summaries of each book chapter are available here; and the short video below provides an excellent overview of the topics covered in the book. Together, these overviews make it clear that this book is directly relevant to many other fields including journalism, human rights, development, activism, business management, computing, ethics, social science, data science, etc. In short, the lessons that digital humanitarians have learned (often the hard way) over the years and the important insights they have gained are directly applicable to fields well beyond the humanitarian space. To this end, Digital Humanitarians is written in a “narrative and conversational style” rather than with dense, technical language.

The story of digital humanitarians is a multifaceted one. Theirs is not just a story about using new technologies to make sense of “Big Data”. For the most part, digital humanitarians are volunteers; volunteers from all walks of life and who occupy every time zone. Many are very tech-savvy and pull all-nighters, but most simply want to make a difference using the few minutes they have with the digital technologies already at their fingertips. Digital humanitarians also include pro-democracy activists who live in countries ruled by tyrants. This story is thus also about hope and humanity; about how technology can extend our humanity during crises. To be sure, if no one cared, if no one felt compelled to help others in need, or to change the status quo, then no one even would bother to use these new, next generation humanitarian technologies in the first place.

I believe this explains why Professor Leysia Palen included the following in her very kind review of my book: “I dare you to read this book and not have both your heart and mind opened.” As I reflected to my editor while in the midst of book writing, an alternative tag line for the title could very well be “How Big Data and Big Hearts are Changing the Face of Humanitarian Response.” It is personally and deeply important to me that the media, would-be volunteers  and others also understand that the digital humanitarians story is not a romanticized story about a few “lone heroes” who accomplish the impossible thanks to their super human technical powers. There are thousands upon thousands of largely anonymous digital volunteers from all around the world who make this story possible. And while we may not know all their names, we certainly do know about their tireless collective action efforts—they mobilize online from all corners of our Blue Planet to support humanitarian efforts. My book explains how these digital volunteers do this, and yes, how you can too.

Digital humanitarians also include a small (but growing) number of forward-thinking professionals from large and well-known humanitarian organizations. After the tragic, nightmarish earthquake that struck Haiti in January 2010, these seasoned and open-minded humanitarians quickly realized that making sense of “Big Data” during future disasters would require new thinking, new risk-taking, new partnerships, and next generation humanitarian technologies. This story thus includes the invaluable contributions of those change-agents and explains how these few individuals are enabling innovation within the large bureaucracies they work in. The story would thus be incomplete without these individuals; without their appetite for risk-taking, their strategic understanding of how to change (and at times circumvent) established systems from the inside to make their organizations still relevant in a hyper-connected world. This may explain why Tarun Sarwal of the International Committee of the Red Cross (ICRC) in Geneva included these words (of warning) in his kind review: “For anyone in the Humanitarian sector — ignore this book at your peril.”

bookcover

Today, this growing, cross-disciplinary community of digital humanitarians are crafting and leveraging ingenious crowdsourcing solutions with trail-blazing insights from advanced computing and artificial intelligence in order to make sense of “Big Data” generated during disasters. In virtually real-time, these new solutions (many still in early prototype stages) enable digital volunteers to make sense of vast volumes of social media, SMS and imagery captured from satellites & UAVs to support relief efforts worldwide.

All of this obviously comes with a great many challenges. I certainly don’t shy away from these in the book (despite my being an eternal optimist : ). As Ethan Zuckerman from MIT very kindly wrote in his review of the book,

“[Patrick] is also a careful scholar who thinks deeply about the limits and potential dangers of data-centric approaches. His book offers both inspiration for those around the world who want to improve our disaster response and a set of fertile challenges to ensure we use data wisely and ethically.”

Digital humanitarians are not perfect, they’re human, they make mistakes, they fail; innovation, after all, takes experimenting, risk-taking and failing. But most importantly, these digital pioneers learn, innovate and over time make fewer mistakes. In sum, this book charts the sudden and spectacular rise of these digital humanitarians and their next generation technologies by sharing their remarkable, real-life stories and the many lessons they have learned and hurdles both cleared & still standing. In essence, this book highlights how their humanity coupled with innovative solutions to “Big Data” is changing humanitarian response forever. Digital Humanitarians will make you think differently about what it means to be humanitarian and will invite you to join the journey online. And that is what it’s ultimately all about—action, responsible & effective action.

Why did I write this book? The main reason may perhaps come as a surprise—one word: hope. In a world seemingly overrun by heart-wrenching headlines and daily reminders from the news and social media about all the ugly and cruel ways that technologies are being used to spy on entire populations, to harass, oppress, target and kill each other, I felt the pressing need to share a different narrative; a narrative about how selfless volunteers from all walks of life, from all ages, nationalities, creeds use digital technologies to help complete strangers on the other side of the planet. I’ve had the privilege of witnessing this digital good-will first hand and repeatedly over the years. This goodwill is what continues to restore my faith in humanity and what gives me hope, even when things are tough and not going well. And so, I wrote Digital Humanitarians first and fore-most to share this hope more widely. We each have agency and we can change the world for the better. I’ve seen this and witnessed the impact first hand. So if readers come away with a renewed sense of hope and agency after reading the book, I will have achieved my main objective.

For updates on events, talks, trainings, webinars, etc, please click here. I’ll be organizing a Google Hangout on March 5th for readers who wish to discuss the book in more depth and/or follow up with any questions or ideas. If you’d like additional information on this and future Hangouts, please click on the previous link. If you wish to join ongoing conversations online, feel free to do so with the FB & Twitter hashtag #DigitalJedis. If you’d like to set up a book talk and/or co-organize a training at your organization, university, school, etc., then do get in touch. If you wish to give a talk on the book yourself, then let me know and I’d be happy to share my slides. And if you come across interesting examples of digital humanitarians in action, then please consider sharing these with other readers and myself by using the #DigitalJedis hashtag and/or by sending me an email so I can include your observation in my monthly newsletter and future blog posts. I also welcome guest blog posts on iRevolutions.

Naturally, this book would never have existed were it for digital humanitarians volunteering their time—day and night—during major disasters across the world. This book would also not have seen the light of day without the thoughtful guidance and support I received from these mentors, colleagues, friends and my family. I am thus deeply and profoundly grateful for their spirit, inspiration and friendship. Onwards!

Digital Jedis: There Has Been An Awakening…

Live: Crowdsourced Verification Platform for Disaster Response

Earlier this year, Malaysian Airlines Flight 370 suddenly vanished, which set in motion the largest search and rescue operation in history—both on the ground and online. Colleagues at DigitalGlobe uploaded high resolution satellite imagery to the web and crowdsourced the digital search for signs of Flight 370. An astounding 8 million volunteers rallied online, searching through 775 million images spanning 1,000,000 square kilometers; all this in just 4 days. What if, in addition to mass crowd-searching, we could also mass crowd-verify information during humanitarian disasters? Rumors and unconfirmed reports tend to spread rather quickly on social media during major crises. But what if the crowd were also part of the solution? This is where our new Verily platform comes in.

Verily Image 1

Verily was inspired by the Red Balloon Challenge in which competing teams vied for a $40,000 prize by searching for ten weather balloons secretly placed across some 8,000,0000 square kilometers (the continental United States). Talk about a needle-in-the-haystack problem. The winning team from MIT found all 10 balloons within 8 hours. How? They used social media to crowdsource the search. The team later noted that the balloons would’ve been found more quickly had competing teams not posted pictures of fake balloons on social media. Point being, all ten balloons were found astonishingly quickly even with the disinformation campaign.

Verily takes the exact same approach and methodology used by MIT to rapidly crowd-verify information during humanitarian disasters. Why is verification important? Because humanitarians have repeatedly noted that their inability to verify social media content is one of the main reasons why they aren’t making wider user of this medium. So, to test the viability of our proposed solution to this problem, we decided to pilot the Verily platform by running a Verification Challenge. The Verily Team includes researchers from the University of Southampton, the Masdar Institute and QCRI.

During the Challenge, verification questions of various difficulty were posted on Verily. Users were invited to collect and post evidence justifying their answers to the “Yes or No” verification questions. The photograph below, for example, was posted with the following question:

Verily Image 3

Unbeknownst to participants, the photograph was actually of an Italian town in Sicily called Caltagirone. The question was answered correctly within 4 hours by a user who submitted another picture of the same street. The results of the new Verily experiment are promissing. Answers to our questions were coming in so rapidly that we could barely keep up with posting new questions. Users drew on a variety of techniques to collect their evidence & answer the questions we posted:

Verily was designed with the goal of tapping into collective critical thinking; that is, with the goal of encouraging people think about the question rather than use their gut feeling alone. In other words, the purpose of Verily is not simply to crowdsource the collection of evidence but also to crowdsource critical thinking. This explains why a user can’t simply submit a “Yes” or “No” to answer a verification question. Instead, they have to justify their answer by providing evidence either in the form of an image/video or as text. In addition, Verily does not make use of Like buttons or up/down votes to answer questions. While such tools are great for identifying and sharing content on sites like Reddit, they are not the right tools for verification, which requires searching for evidence rather than liking or retweeting.

Our Verification Challenge confirmed the feasibility of the Verily platform for time-critical, crowdsourced evidence collection and verification. The next step is to deploy Verily during an actual humanitarian disaster. To this end, we invite both news and humanitarian organizations to pilot the Verily platform with us during the next natural disaster. Simply contact me to submit a verification question. In the future, once Verily is fully developed, organizations will be able to post their questions directly.

bio

See Also:

  • Verily: Crowdsourced Verification for Disaster Response [link]
  • Crowdsourcing Critical Thinking to Verify Social Media [link]
  • Six Degrees of Separation: Implications for Verifying Social Media [link]

Got TweetCred? Use it To Automatically Identify Credible Tweets (Updated)

Update: Users have created an astounding one million+ tags over the past few weeks, which will help increase the accuracy of TweetCred in coming months as we use these tags to further train our machine learning classifiers. We will be releasing our Firefox plugin in the next few days. In the meantime, we have just released our paper on TweetCred which describes our methodology & classifiers in more detail.

What if there were a way to automatically identify credible tweets during major events like disasters? Sounds rather far-fetched, right? Think again.

The new field of Digital Information Forensics is increasingly making use of Big Data analytics and techniques from artificial intelligence like machine learning to automatically verify social media. This is how my QCRI colleague ChaTo et al. already predicted both credible and non-credible tweets generated after the Chile Earthquake (with an accuracy of 86%). Meanwhile, my colleagues Aditi, et al. from IIIT Delhi also used machine learning to automatically rank the credibility of some 35 million tweets generated during a dozen major international events such as the UK Riots and the Libya Crisis. So we teamed up with Aditi et al. to turn those academic findings into TweetCred, a free app that identifies credible tweets automatically.

CNN TweetCred

We’ve just launched the very first version of TweetCred—key word being first. This means that our new app is still experimental. On the plus side, since TweetCred is powered by machine learning, it will become increasingly accurate over time as more users make use of the app and “teach” it the difference between credible and non-credible tweets. Teaching TweetCred is as simple as a click of the mouse. Take the tweet below, for example.

ARC TweetCred Teach

TweetCred scores each tweet based based on a 7-point system, the higher the number of blue dots, the more credible the content of the tweet is likely to be. Note that a TweetCred score also takes into account any pictures or videos included in a tweet along with the reputation and popularity of the Twitter user. Naturally, TweetCred won’t always get it right, which is where the teaching and machine learning come in. The above tweet from the American Red Cross is more credible than three dots would suggest. So you simply hover your mouse over the blue dots and click on the “thumbs down” icon to tell TweetCred it got that tweet wrong. The app will then ask you to tag the correct level of credibility for that tweet is.

ARC TweetCred Teach 3

That’s all there is to it. As noted above, this is just the first version of TweetCred. The more all of us use (and teach) the app, the more accurate it will be. So please try it out and spread the word. You can download the Chrome Extension for TweetCred here. If you don’t use Chrome, you can still use the browser version here although the latter has less functionality. We very much welcome any feedback you may have, so simply post feedback in the comments section below. Keep in mind that TweetCred is specifically designed to rate the credibility of disaster/crisis related tweets rather than any random topic on Twitter.

As I note in my book Digital Humanitarians (forthcoming), empirical studies have shown that we’re less likely to spread rumors on Twitter if false tweets are publicly identified by Twitter users as being non-credible. In fact, these studies show that such public exposure increases the number of Twitter users who then seek to stop the spread of said of rumor-related tweets by 150%. But, it makes a big difference whether one sees the rumors first or the tweets dismissing said rumors first. So my hope is that TweetCred will help accelerate Twitter’s self-correcting behavior by automatically identifying credible tweets while countering rumor-related tweets in real-time.

This project is a joint collaboration between IIIT and QCRI. Big thanks to Aditi and team for their heavy lifting on the coding of TweetCred. If the experiments go well, my QCRI colleagues and I may integrate TweetCred within our AIDR (Artificial Intelligence for Disaster Response) and Verily platforms.

Bio

See also:

  • New Insights on How to Verify Social Media [link]
  • Predicting the Credibility of Disaster Tweets Automatically [link]
  • Auto-Ranking Credibility of Tweets During Major Events [link]
  • Auto-Identifying Fake Images on Twitter During Disasters [link]
  • Truth in the Age of Social Media: A Big Data Challenge [link]
  • Analyzing Fake Content on Twitter During Boston Bombings [link]
  • How to Verify Crowdsourced Information from Social Media [link]
  • Crowdsourcing Critical Thinking to Verify Social Media [link]
  • Tweets, Crises and Behavioral Psychology: On Credibility and Information Sharing [link]

New Insights on How To Verify Social Media

The “field” of information forensics has seen some interesting developments in recent weeks. Take the Verification Handbook or Twitter Lie-Detector project, for example. The Social Sensor project is yet another new initiative. In this blog post, I seek to make sense of these new developments and to identify where this new field may be going. In so doing, I highlight key insights from each initiative. 

VHandbook1

The co-editors of the Verification Handbook remind us that misinformation and rumors are hardly new during disasters. Chapter 1 opens with the following account from 1934:

“After an 8.1 magnitude earthquake struck northern India, it wasn’t long before word circulated that 4,000 buildings had collapsed in one city, causing ‘innumerable deaths.’ Other reports said a college’s main building, and that of the region’s High Court, had also collapsed.”

These turned out to be false rumors. The BBC’s User Generated Content (UGC) Hub would have been able to debunk these rumors. In their opinion, “The business of verifying and debunking content from the public relies far more on journalistic hunches than snazzy technology.” So they would have been right at home in the technology landscape of 1934. To be sure, they contend that “one does not need to be an IT expert or have special equipment to ask and answer the fundamental questions used to judge whether a scene is staged or not.” In any event, the BBC does not “verify something unless [they] speak to the person that created it, in most cases.” What about the other cases? How many of those cases are there? And how did they ultimately decide on whether the information was true or false even though they did not  speak to the person that created it?  

As this new study argues, big news organizations like the BBC aim to contact the original authors of user generated content (UGC) not only to try and “protect their editorial integrity but also because rights and payments for newsworthy footage are increasingly factors. By 2013, the volume of material and speed with which they were able to verify it [UGC] were becoming significant frustrations and, in most cases, smaller news organizations simply don’t have the manpower to carry out these checks” (Schifferes et al., 2014).

Credit: ZDnet

Chapter 3 of the Handbook notes that the BBC’s UGC Hub began operations in early 2005. At the time, “they were reliant on people sending content to one central email address. At that point, Facebook had just over 5 million users, rather than the more than one billion today. YouTube and Twitter hadn’t launched.” Today, more than 100 hours of content is uploaded to YouTube every minute; over 400 million tweets are sent each day and over 1 million pieces of content are posted to Facebook every 30 seconds. Now, as this third chapter rightly notes, “No technology can automatically verify a piece of UGC with 100 percent certainty. However, the human eye or traditional investigations aren’t enough either. It’s the combination of the two.” New York Times journalists concur: “There is a problem with scale… We need algorithms to take more onus off human beings, to pick and understand the best elements” (cited in Schifferes et al., 2014).

People often (mistakenly) see “verification as a simple yes/no action: Something has been verified or not. In practice, […] verification is a process” (Chapter 3). More specifically, this process is one of satisficing. As colleagues Leysia Palen et al.  note in this study, “Information processing during mass emergency can only satisfice because […] the ‘complexity of the environment is immensely greater than the computational powers of the adaptive system.'” To this end, “It is an illusion to believe that anyone has perfectly accurate information in mass emergency and disaster situations to account for the whole event. If someone did, then the situation would not be a disaster or crisis.” This explains why Leysia et al seek to shift the debate to one focused on the helpfulness of information rather the problematic true/false dichotomy.

Credit: Ann Wuyts

“In highly contextualized situations where time is of the essence, people need support to consider the content across multiple sources of information. In the online arena, this means assessing the credibility and content of information distributed across [the web]” (Leysia et al., 2011). This means that, “Technical support can go a long way to help collate and inject metadata that make explicit many of the inferences that the every day analyst must make to assess credibility and therefore helpfulness” (Leysia et al., 2011). In sum, the human versus computer debate vis-a-vis the verification of social media is somewhat pointless. The challenge moving forward resides in identifying the best ways to combine human cognition with machine computing. As Leysia et al. rightly note, “It is not the job of the […] tools to make decisions but rather to allow their users to reach a decision as quickly and confidently as possible.”

This may explain why Chapter 7 (which I authored) applies both human and advanced computing techniques to the verification challenge. Indeed, I explicitly advocate for a hybrid approach. In contrast, the Twitter Lie-Detector project known as Pheme apparently seeks to use machine learning alone to automatically verify online rumors as they spread on social networks. Overall, this is great news—the more groups that focus on this verification challenge, the better for those us engaged in digital humanitarian response. It remains to be seen, however, whether machine learning alone will make Pheme a success.

pheme

In the meantime, the EU’s Social Sensor project is developing new software tools to help journalists assess the reliability of social media content (Schifferes et al., 2014). A preliminary series of interviews revealed that journalists were most interested in Social Sensor software for:

1. Predicting or alerting breaking news

2. Verifying social media content–quickly identifying who has posted a tweet or video and establishing “truth or lie”

So the Social Sensor project is developing an “Alethiometer” (Alethia is Greek for ‘truth’) to “meter the credibility of of information coming from any source by examining the three Cs—Contributors, Content and Context. These seek to measure three key dimensions of credibility: the reliability of contributors, the nature of the content, and the context in which the information is presented. This reflects the range of considerations that working journalists take into account when trying to verify social media content. Each of these will be measured by multiple metrics based on our research into the steps that journalists go through manually. The results of [these] steps can be weighed and combined [metadata] to provide a sense of credibility to guide journalists” (Schifferes et al., 2014).

SocialSensor1

On our end, my colleagues and at QCRI are continuing to collaborate with several partners to experiment with advanced computing methods to address the social media verification challenge. As noted in Chapter 7, Verily, a platform that combines time-critical crowdsourcing and critical thinking, is still in the works. We’re also continuing our collaboration on a Twitter credibility plugin (more in Chapter 7). In addition, we are exploring whether we can microtask the computation of source credibility scores using MicroMappers.

Of course, the above will sound like “snazzy technologies” to seasoned journalists with no background or interest in advanced computing. But this doesn’t seem to stop them from complaining that “Twitter search is very hit and miss;” that what Twitter “produces is not comprehensive and the filters are not comprehensive enough” (BBC social media expert, cited in Schifferes et al., 2014). As one of my PhD dissertation advisors (Clay Shirky) noted a while back already, information overflow (Big Data) is due to “Filter Failure”. This is precisely why my colleagues and I are spending so much of our time developing better filters—filters powered by human and machine computing, such as AIDR. These types of filters can scale. BBC journalists on their own do not, unfortunately. But they can act on hunches and intuition based on years of hands-on professional experience.

The “field” of digital information forensics has come along way since I first wrote about how to verify social media content back in 2011. While I won’t touch on the Handbook’s many other chapters here, the entire report is an absolute must read for anyone interested and/or working in the verification space. At the very least, have a look at Chapter 9, which combines each chapter’s verification strategies in the form of a simple check-list. Also, Chapter 10 includes a list of  tools to aid in the verification process.

In the meantime, I really hope that we end the pointless debate about human versus machine. This is not an either/or issue. As a colleague once noted, what we really need is a way to combine the power of algorithms and the wisdom of the crowd with the instincts of experts.

bio

See also:

  • Predicting the Credibility of Disaster Tweets Automatically [link]
  • Auto-Ranking Credibility of Tweets During Major Events [link]
  • Auto-Identifying Fake Images on Twitter During Disasters [link]
  • Truth in the Age of Social Media: A Big Data Challenge [link]
  • Analyzing Fake Content on Twitter During Boston Bombings [link]
  • How to Verify Crowdsourced Information from Social Media [link]
  • Crowdsourcing Critical Thinking to Verify Social Media [link]

Yes, I’m Writing a Book (on Digital Humanitarians)

I recently signed a book deal with Taylor & Francis Press. The book, which is tentatively titled “Digital Humanitarians: How Big Data is Changing the Face of Disaster Response,” is slated to be published next year. The book will chart the rise of digital humanitarian response from the Haiti Earthquake to 2015, highlighting critical lessons learned and best practices. To this end, the book will draw on real-world examples of digital humanitarians in action to explain how they use new technologies and crowdsourcing to make sense of “Big (Crisis) Data”. In sum, the book will describe how digital humanitarians & humanitarian technologies are together reshaping the humanitarian space and what this means for the future of disaster response. The purpose of this book is to inspire and inform the next generation of (digital) humanitarians while serving as a guide for established humanitarian organizations & emergency management professionals who wish to take advantage of this transformation in humanitarian response.

2025

The book will thus consolidate critical lessons learned in digital humanitarian response (such as the verification of social media during crises) so that members of the public along with professionals in both international humanitarian response and domestic emergency management can improve their own relief efforts in the face of “Big Data” and rapidly evolving technologies. The book will also be of interest to academics and students who wish to better understand methodological issues around the use of social media and user-generated content for disaster response; or how technology is transforming collective action and how “Big Data” is disrupting humanitarian institutions, for example. Finally, this book will also speak to those who want to make a difference; to those who of you who may have little to no experience in humanitarian response but who still wish to help others affected during disasters—even if you happen to be thousands of miles away. You are the next wave of digital humanitarians and this book will explain how you can indeed make a difference.

The book will not be written in a technical or academic writing style. Instead, I’ll be using a more “storytelling” form of writing combined with a conversational tone. This approach is perfectly compatible with the clear documentation of critical lessons emerging from the rapidly evolving digital humanitarian space. This conversational writing style is not at odds with the need to explain the more technical insights being applied to develop next generation humanitarian technologies. Quite on the contrary, I’ll be using intuitive examples & metaphors to make the most technical details not only understandable but entertaining.

While this journey is just beginning, I’d like to express my sincere thanks to my mentors for their invaluable feedback on my book proposal. I’d also like to express my deep gratitude to my point of contact at Taylor & Francis Press for championing this book from the get-go. Last but certainly not least, I’d like to sincerely thank the Rockefeller Foundation for providing me with a residency fellowship this Spring in order to accelerate my writing.

I’ll be sure to provide an update when the publication date has been set. In the meantime, many thanks for being an iRevolution reader!

bio