Data Science for Social Good: Not Cognitive Surplus but Cognitive Mismatch

Posted on June 20, 2013 | 20 Comments

I’ve spent the past 12 months working with top notch data scientists at QCRI et al. The following may thus be biased: I think QCRI got it right. They strive to balance their commitment to positive social change with their primary mission of becoming a world class institute for advanced computing research. The two are not mutually exclusive. What it takes is a dedicated position, like the one created for me at QCRI. It is high time that other research institutes, academic programs and international computing conferences create comparable focal points to catalyze data science for social good.

Microsoft Research, to name just one company, carries out very interesting research that could have tremendous social impact, but the bridge necessary to transfer much of that research from knowledge to operation to social impact is often not there. And when it is, it is usually by happenstance. So researchers continue to formulate research questions based on what they find interesting rather than identifying equally interesting questions that could have direct social impact if answered by data science. Hundreds of papers get presented at computing conferences every month, and yet few if any of the authors have linked up with organizations like the United Nations, World Bank, Habitat for Humanity etc., to identify and answer questions with social good potential. The same is true for hundreds of computing dissertations that get defended every year. Doctoral students do not realize that a minor reformulation of their research question could perhaps make a world of difference to a community-based organization in India dedicated to fighting corruption, for example.

The challenge here is not one of untapped cognitive surplus (to borrow from Clay Shirky), but rather complete cognitive mismatch. As my QCRI colleague Ihab Ilyas puts it: there are “problem owners” on the one hand and “problem solvers” on the other. The former have problems that prevent them from catalyzing positive social change. The later know how to solve comparable problems and do so every day. But the two are not talking or even aware of each other. Creating and maintaining this two-way conversation requires more than one dedicated position (like mine at QCRI).

In short, I really want to have dedicated counterparts at Microsoft Research, IBM, SAP, LinkedIn, Bitly, GNIP, etc., as well as leading universities, top notch computing conferences and challenges; counterparts who have one foot in the world of data science and the other in the social sector; individuals who have a demonstrated track-record in bridging communities. There’s a community here waiting to be connected and needing to be formed. Again, carrying out cutting edge computing R&D is in no way incompatible with generating positive social impact. Moreover, the latter provides an important return on investment in the form of data, reputation, publicity, connections and social capital. In sum, social good challenges need to be formulated into research questions that have scientific as well as social good value. There is definitely a sweet spot here but it takes a dedicated community to bring problem owners and solvers together and hit that social good sweet spot.

This entry was posted in Big Data and tagged data, Good, Science, social. Bookmark the permalink.

20 responses to “Data Science for Social Good: Not Cognitive Surplus but Cognitive Mismatch”

mikelmaron | June 20, 2013 at 9:24 am | Reply

I think you just got your counterpart at the World Bank, with Tariq’s new appointment as Data Scientist 🙂 https://twitter.com/tkb/status/347048884620632064
- Patrick Meier | June 20, 2013 at 9:32 am | Reply
  
  Yes good news indeed, Tariq and I had a quick chat about this & potential areas of collaboration earlier this month
NoelDickover (@NoelDickover) | June 20, 2013 at 9:34 am | Reply

Patrick, it may be that you’re not giving yourself enough credit here. You have deep knowledge about a part of the development space (crisis response and humanitarian work). This level of understanding takes active participation to develop. So yes, it would be great if an IBM person was tasked to start working on this and collaborating with you, but their bigger task would be to engage with some part of the social good space.

These “connectors”, who have feet in both the data world (or mapping world, or mobile world, or other important technology enabled slices) and some foothold the social good space aren’t as plentiful as we would like. I don’t know that the reason though is that these organizations aren’t as enlightened as we wish (although that may be the case). I think we are just now seeing a cadre of people that work well in both worlds – in places like the US, the UK, Switzerland, and so forth.

In developing countries though, there is seldom a workable tech for social good ecosystem. This means that when we parachute in to these places, we’re usually stuck working either the development community OR the tech community in the country, if one does exist. The real challenge, I think, isn’t getting the SAPs of the world to hire folks, but its cultivating tech for social good ecosystems on the ground in places we want to impact. If we start working this problem, the community connections you mention will be far more powerful.
- Patrick Meier | June 20, 2013 at 9:42 am | Reply
  
  Thanks for reading and sharing, Noel. You read my mind re data science for social good in developing countries. Been talking with a colleague about bringing her on board at QCRI to explore & research possibilities around cultivating data science for social good ecosystems in emerging economies with high potential for social impact.
  
  Re the IBM example, am not asking for an IBM person to collaborate with me, am thinking more of an IBM person hiring someone firmly in the social good space to channel IBM’s data science expertise for social good and link up with others like myself, Jake Porway, Catherine Bracy, etc. IBM is based globally and can thus have the local impact you’re referring to.
Jonathan Talbot | June 20, 2013 at 10:28 am | Reply

Isn’t the nut of the challenge that those who consider themselves in the social good sector are largely unaware of what data and tools exist? That they operate within constraints that they could get beyond if they understood what was possible and knew who to ask to team?

I think a clearinghouse of information that would allow one set of people to pose challenges and describe constraints as they see them, and allow others who understand the data and computational side to review and interpret, ask clarifying questions, identify existing tools and researchers/students/organizations looking for projects or with concentrations of relevant expertise, would make for rapid progress.

I would fight the assumption that you need heroic individuals to make progress on this front.
- Patrick Meier | June 20, 2013 at 10:40 am | Reply
  
  Thanks for reading for sharing your thoughts, Jonathan.
  
  While connectors are important (I don’t see them as heroic), fostering a community is one way to create a sustainable space for a clearinghouse of information. This is the purpose of the CrisisMappers Network, for example, which bridges the humanitarian community with the broader technology community. The list-serve, which has been around for 5+ years, is used as a clearinghouse of information, especially during disasters, because we have an active community. There are close to 2,000 list-serve subscribers at CrisisMappers, which means the chances of successfully connecting problem owners and problem solvers are higher, hence my bias towards a community and initial connectors to catalyze connections across disciplines. This explains why I recently launched the Data Science for Social Good list-serve. In short, this is all to say that I agree with you re clearing house 🙂
  - Pietro Michelucci | June 21, 2013 at 12:59 pm |
    
    Hi Patrick! Great idea (“connectors”), as per usual. Jonathan Talbot’s comments are also well taken. I wonder if there are different qualities of connection to be made, each with different affordances and implied levels of commitment. For example, designating a “connector” within each provider organization orients an organization toward a certain societal objectives and values, and facilitates engagement toward solutions. A listserve helps providers and stakeholders find each other. And perhaps new human computational approaches, such as collaborative problem solving, permit individuals to contribute asynchronously toward larger goals, spending only a few minutes a day while waiting in line at the grocery store (“casual problem solving” rather than “casual gaming” – or make them indistinguishable?). What’s more, I wonder if these and other mechanisms for connecting could, themselves, be connected in various fruitful ways? Anyway, keep up the good work!
  - Patrick Meier | June 21, 2013 at 1:25 pm |
    
    Many thanks for sharing, and for reading, Pietro. Yes, I too agree with Jonathan Talbot’s comments. And I also agree with your suggestion that different qualities of connections can be made. This reminds me of the importance of weak and strong ties in social network theory/analysis. Both are needed. But I certainly don’t have all the answers, that’s for sure. Hence my blog post 🙂 So thanks very much for your feedback and follow up questions.
Tapan Parikh | June 20, 2013 at 8:46 pm | Reply

Submit to ICTD and ACM DEV! There is a growing community of computer scientists that definitely agrees with you.
- Patrick Meier | June 21, 2013 at 8:46 am | Reply
  
  Thanks for reading, Tapan, and for your advice. Do you mean submit a paper to these conferences?
Joy Robson | June 21, 2013 at 9:43 am | Reply

Hi Patrick, glad you made note of Jake Porway, Catherine Bracy, et al — this feels like the tipping point for a long over due idea 🙂
Andrej Verity (@andrejverity) | June 21, 2013 at 4:33 pm | Reply

It will be really great to see this develop. Of course, part of it is the age old problem of those with the problem defining it well enough for an external entity (e.g. outsourced) and work on it and produce what is needed. Therefore, we don’t only need dedicated capacity in the big data side, but we need more dedicated focal points (with time!) on the humanitarian/development side. SAS was very keen to help us in the 2010 Pakistan floods, but we were unable to articulate problems or challenges for them to work on [we were not really prepared for such an offer!]

Part of the trick going forward is also going to make sure that these new partners deliver. There have been countless projects where private firms and/or academic promise to deliver something for the UN but either back away, assign their most junior staff, or simply fail to deliver. These experiences have developed many people to be skeptical of such groups so we need to make sure that we get the setup and the dedication right.

In regards to some of the developing countries, I would suggest that we be careful on what we say. For example, I know that there is a GIS lab at the main university in Bangui (CAR) that is capable [although face problems of electricity & internet connection]. How long before a center like this starts to have the skills to handle large sets of data themselves – especially if development organizations start providing funding to ensure that countries can build such a capacity.

Andrej
- Patrick Meier | June 21, 2013 at 6:10 pm | Reply
  
  Thanks very much for reading and commenting, Andrej. And thanks for highlighting the need to deliver.
  
  On developing countries, your point about Bangui is music to my ears. “How long before a center like this starts to have the skills to handle large sets of data themselves – especially if development organizations start providing funding to ensure that countries can build such a capacity.” My question: what can groups like QCRI, DataKind, Code for America, etc, do (if anything) to accelerate this process of skills development, particularly vis-a-vis Data Science? Is collaborating with existing Innovation Labs the way to go? With universities? Government institutions? Code for America is just starting to explore this question with their new International Program, which is why I’m looking for ways to collaborate with them. We should absolutely be focusing on adding/building local capacity for data science. I would love to see local data science hacking labs grow, perhaps like the UN GP’s Pulse Labs in Indonesia and Uganda? There are of course many infrastructure challenges when dealing with Big Data, but they are not insurmountable if (as you write) “development organizations start providing funding to ensure that countries can build such a capacity.”
Jess Daniel | June 22, 2013 at 8:23 am | Reply

Hi, I’m a social epidemiology doctoral student and have been thinking a lot about what you propose. Do you have any more specific ideas about how this would look in an academic setting? Would it have to be a professor? Or maybe a research assistant that they hire specifically to reach out to relevant people who could put their research findings into practice? I’d love to talk to you more about this – I think this is a huge gap in a our field (one that purports to be about helping people, no less).
- Patrick Meier | June 22, 2013 at 2:37 pm | Reply
  
  Hi Jess, many thanks for reading and for your follow up. Would be great to brainstorm with you re specifics in an academic setting. In terms of my own focus on humanitarian/development issues, one thought I had was to approach several computer science professors and invite them to have their students get in touch with me when they are thinking through potential course projects, papers, dissertations, etc. Based on these students’ interests, I would put some feelers out through contacts at humanitarian/development organizations and ask whether anyone was looking for problem solvers re current/future projects. So rather informal for now but over time could become more “institutionalized”? Happy to chat further, my email: patrick at iRevolution dot net. Thanks again
- Lorelei Kelly | June 24, 2013 at 12:18 am | Reply
  
  Jess and Patrick,
  I’ve been researching how to re-engineer the expert knowledge system of the US Congress for many years—it is now possible to re-locate certain functions of expertise into states and districts because of new transparency rules (during policymaking and oversight, for example) One intriguing set of papers that I found were published by the Kellogg Foundation many years ago…they were commissioned to explore knowledge for the public interest–in the case of the land grant schools. One was called Kellogg Commission on the Future of State and Land-Grant Universities and the other was called Returning to Our Roots: the Engaged Institution… this commission was disbanded, however. I’ve never understood why. It seems they had very little follow on. Some land grant schools, however, did start up public engagement offices. I need to follow up on that, as they have an office in DC to represent their interests. I interview Hill staff frequently about how they sort and filter information, very few tell me that their universities in district or in state are useful or helpful to them during the policymaking process. That’s not to say that good intentions don’t exist, they do. But they don’t “plug in” in the right way. This is one of the things I look at, how we can help create an evidence based system that is distributed. Congress’ problem, for example, isn’t in crowdsourcing, its in curation. Another role for connectors, indeed! Thanks for the helpful piece, Patrick.
  - Jess Daniel | June 27, 2013 at 7:41 am |
    
    Lorelai, I’m not surprised to hear that state/local universities are not helpful during the policymaking process. At least where I am, researchers ask the question, “What is intellectually interesting to me?” and rarely “What is a question that will inform social policy?”. Part of the problem, in my opinion, is that we don’t know what questions are policy relevant today – or at least how the big questions can be broken down into pieces small enough to focus one research project around. I agree wholeheartedly that this is a job for connectors, but I’m not sure exactly how I see it working. Academia is such a different beast, with it’s own weird conventions. Professors are already so overworked that they definitely won’t be able to do this on their own. Maybe if one day a budget for such a person on each grant from NIH was standard, professors could have a dedicated person on each of their grants whose sole job was to connect the research to people who could use it – however, this would not solve the problem of asking relevant questions, because the grant would already have outlined what the study was researching. Anyway, I’m very interested to brainstorm more ideas about how to do this. Patrick, I’m sorry I haven’t emailed you yet – soon!
Pingback: Analyzing Foursquare Check-Ins During Hurricane Sandy | iRevolution
Pingback: Automatically Identifying Fake Images Shared on Twitter During Disasters | iRevolution
Librarian Thinking Outloud | October 8, 2013 at 11:14 am | Reply

My father worked in such a research center created by the Alberta (Canada) government to make sure that research was relevant to and communicated to the end-users. The field wasn’t data but agriculture. Perhaps there are models in other industries that we can learn from.