I just started a survey (3000 sample size) where submissions are done via 2G/3G network in East Africa.
So far after about 1900 submissions, about 83 are duplicates. I mentioned this issue here and Mitch replied with the following statement:
The critical question is whether you defined an instanceId in your form.
http://groups.google.com/group/opendatakit/browse_thread/thread/f77bf8942cdfe182/12e7e33119a83172?lnk=gst&q=instanceId#12e7e33119a83172If you did not define an instanceId, ODK Aggregate cannot de-duplicate your data. I just noticed that the Opendatakit.org form design pages don’t mention this important aspect of form design. I’ll update them next week.
Mitch
I then posed the following question to Mitch (and I’m hoping the KoboCollect developers can also add their views too):
I’m still wondering…if ODK Collect (or KoboCollect in our case) has successfully
submitted a form AND its status has changed from FINISHED to SUBMITTED, then why would it even re-submit said instance – regardless of whether an instanceid field is defined or not? Just curious.
So, are the duplicates our survey is experiencing due to bad network? Because I’m thinking that if a form is already submitted, the application KoboCollect or ODK Collect shouldn’t even try to submit it again.
Looking at the Aggregate web interface on Appspot, I see a clear duplicate with the following details:
-
Mobile ID3 has duplicate data with the difference only in the end times. The first submission has an end time of **10:31:16** and the second submission has an end time of **14:07:08**. All the other data captured are the **same** for the two duplicate submissions. This is a concurrent data collection using a single mobile phone which is not possible. Hence a 4 hour submission lag yet still a duplicate.
So, is this duplicate thing due to a bad network? Eventually, the forms are successfully
submitted! Or, is it an issue with the data collector, perhaps, not using KoboCollect correctly. We did provide training a few days ago.
Finally, has anyone used this solution to avoid duplicate entries in the future as suggested by Mitch?
For ODK Aggregate, you don’t need to specify the namespace, just have a group in your form.
With ODK Collect 1.1.7 and later, the bind for this element that replicates the instanceID that would otherwise be generated by Aggregate would be:
You can construct your own instanceID expressions. However, you should avoid symbols and punctuation other than colons and dashes since the parsing logic within Aggregate is likely fragile if you go wild with punctuation (and that is used later on when retrieving images, repeat groups, etc.).
~DataMax
···
=============