- We’ve never been able to reproduce this reliably, and that’s why it’s not fixed
- See https://github.com/kobotoolbox/kobocat/issues/470. In 2018, we identified a duplicate rate of 0.5% in a sampling of 50k submissions. Clearly, in the right conditions, it’s quite higher.
- Yes (with the caveat of us developers not having reproduced it)
- Yes (same caveat)
- There are a few scenarios that people report as “duplicate submissions”:
- The first, which describes Diane’s case, is true duplication, where the XML submissions are completely identical. I consider it a bug that KoBoCAT does not reject a submission whose identical XML already exists in another submission belonging to the same project. This problem is best detected by, first, looking for duplicate UUIDs, and then comparing the XML for any submissions that share the same UUID. Note the
_id
of each suspicious submission and retrieve its XML from the API, e.g. at https://kf.kobotoolbox.org/api/v2/assets/aYourProjectUid/data/12345.xml, where12345
is the_id
of the submission you want to retrieve. - A different scenario consists of submissions that share the same UUID but have different XML contents. UUIDs are generated by the client (Collect, Enketo, someone posting XML to the API, etc.), not the KoBo server. Some OpenRosa implementations reject duplicate UUIDs, but we err on the side of never discarding legitimate data—and we do not plan to change this behavior. This situation can be detected in a similar manner to the previous one: look for the duplicate UUIDs, and then compare the XML. If the XML differs, then the problem lies with the client. Obviously, if you see different responses in submissions that share the same UUID, you don’t need to go to the trouble of comparing the XML.
- The first, which describes Diane’s case, is true duplication, where the XML submissions are completely identical. I consider it a bug that KoBoCAT does not reject a submission whose identical XML already exists in another submission belonging to the same project. This problem is best detected by, first, looking for duplicate UUIDs, and then comparing the XML for any submissions that share the same UUID. Note the
- There’s a lot of KoBoCAT work in the queue ahead of this. Honestly, we probably won’t begin to address it until the first quarter of 2021.
3 Likes