The importance of data validation is unquestioned but few empirical studies seek to assess the possible errors incurred during mobile data collection. Authors Somani Patnaik, Emma Brunskill and William Thies thus carried out what is possibly the first quantitative evaluation (PDF) of data entry accuracy on mobile phones in resource-constrained environments. They just presented their findings at ICTD 2009.
Mobile devices have become an increasingly important tool for information collection. Hence, for example, my interest in pushing forward the idea of Mobile Crisis Mapping (MCM). While studies on data accuracy exist for personal digital assistants (PDAs), there are very few that focus on mobile phones. This new study thus evaluates three user interfaces for information collection: 1) Electronic forms; 2) SMS and 3) voice.
The results of the study indicate the following associated error rates:
- Electronic forms = 4.2%
- SMS = 4.5%
- Voice = 0.45%
For compartive purposes and context, note that error rates using PDAs have generally been less than 2%. These figures represent the fraction of questions that were answered incorrectly. However, since “each patient interaction consisted of eleven questions, the probability of error somewhere in a patient report is much higher. For both electronic forms and SMS, 10 out of 26 reports (38%) contained an error; for voice, only 1 out of 20 reports (5%) contained an error (which was due to operator transcription).
I do hope that the results of this study prompt many others to carry out similar investigations. I think we need a lot more studies like this one but with a larger survey sample (N) and across multiple sectors (this study drew on just 13 healthworkers).
The UN Threat and Risk Mapping Analysis (TRMA) project I’m working on in the Sudan right now will be doing a study on data collection accuracy using mobile phones when they roll out their program later this month. The idea is to introduce mobile phones in a number of localities and not in neighboring ones. The team will then compare the data quality of both samples.
I look forward to sharing the results.