Submissions being duplicated with same uuid

Hello,

We’ve realized that a form created some duplicate submissions. It is data collected using Enketo on tablets, on the researcher server. The duplicates have exactly the same data and metadata, and even the same _id and _uuid (which should be impossibe right?).

Do you know what’s happening? I’m worried that deleting a duplicate will delete the 2 submissions.

Best

Diane

Would you mind sharing the screenshot of the issue. It would be very helpful. Could you also let us know if your survey project has some image questions that should be collected?

Sure, here is a screenshot showing the duplicate submissions:

No image should be collected in this form.

Could you help us by providing your username and project name through a private message. It would help us better understand the situation. TIA!

Sure, will send you the details in private. Thank you!!

1 Like

Hi @dianedetoeuf
Thanks for sending the information you did. We have had a chance to review this with our developers and we noted that there is a bug that the team would be working on. We apologize for any inconvenience caused. The only option now is to have the duplicate deleted; this will not delete the other copy. It could be a good thing to download your data as it is, just in case you are a bit worried about the deletion of both.

Stephane

2 Likes

Hello @stephanealoo,
As this seems to be a general issue, would you mind to provide more details:

  1. When does/can it happen?
  2. Does it happen on both servers (OCHA & HHI)?
  3. Does it happen for KoBoCollect, Enketo & ODK Collect?
  4. How can it best be detected?
  5. Until when will this probably be fixed?
    Kind regards
1 Like
  1. We’ve never been able to reproduce this reliably, and that’s why it’s not fixed :frowning:
  2. Yes (with the caveat of us developers not having reproduced it)
  3. Yes (same caveat)
  4. There are a few scenarios that people report as “duplicate submissions”:
    • The first, which describes Diane’s case, is true duplication, where the XML submissions are completely identical. I consider it a bug that KoBoCAT does not reject a submission whose identical XML already exists in another submission belonging to the same project. This problem is best detected by, first, looking for duplicate UUIDs, and then comparing the XML for any submissions that share the same UUID. Note the _id of each suspicious submission and retrieve its XML from the API, e.g. at https://kf.kobotoolbox.org/api/v2/assets/aYourProjectUid/data/12345.xml, where 12345 is the _id of the submission you want to retrieve.
    • A different scenario consists of submissions that share the same UUID but have different XML contents. UUIDs are generated by the client (Collect, Enketo, someone posting XML to the API, etc.), not the KoBo server. Some OpenRosa implementations reject duplicate UUIDs, but we err on the side of never discarding legitimate data—and we do not plan to change this behavior. This situation can be detected in a similar manner to the previous one: look for the duplicate UUIDs, and then compare the XML. If the XML differs, then the problem lies with the client. Obviously, if you see different responses in submissions that share the same UUID, you don’t need to go to the trouble of comparing the XML.
  5. There’s a lot of KoBoCAT work in the queue ahead of this. Honestly, we probably won’t begin to address it until the first quarter of 2021.
3 Likes

Hi @stephanealoo,

Thanks a lot to you and the team for looking at this! We’ll delete the duplicates and save the data before just in case.

Best

Diane

2 Likes

Hi all,

I have also seen this problem before, the only thing to do it was cleaning the duplicated submission. Then, for this second time, you answer to Diane’s request will help us to fix this problem.

Best regards
Bernard

1 Like

thank you Kal_Lam Brother.
brother can i set _id is s.l from 1
if possible please help me.

Hi @mizanvai,

Could you kindly please elaborate your issue so that we could understand your requirement and help you out if it’s possible through KoBoToolbox.

when i submit new Kobo data “_id” filed become from 9 digits as like “134598324”, if is it possible from "1"
thank you.

Hi @mizanvai,

Kindly please be informed that this is not possible as _id is the ID provided by the KoBoToolbox system which is unique for each server i.e. HHI, OCHA or a self hosted server. Maybe this post discussed previously should also help you understand what _id is much better:

Hello,

We had another issue with duplicated uuid, affecting 1% of the data (which is quite high!).
Did someone manage to replicate the issue and try to solve it?

Best
Diane

2 Likes

Hello @dianedetoeuf,
Could you provide more details, please:

  • At which place do you get/see the duplicates? Table view? Export?
  • Could you try several times, with performant server/internet connection, if you get the same duplicates?
  • In your previous example (Sep 2020) the cases in your screenshot had only same _uuid, but different _id. (Same in screenshot of @Bernard_26). Is this the same today?
  • Are all other data of the cases the same, incl. internal data like full submission_time?
  • What happens if you try to view or edit the duplicates? Did you ever use Briefcase on this data set?
  • Do the cases have (multiple) media attachments?
  • Could someone from the KoBo Core team verify for this project that there are really duplicates in the database (or only in the table view)?

As mentioned above by jnm, there is an open github bug report (since July 2018).

I am afraid _uuid duplication is a crucial issue, meaning that the _uuid cannot be trusted as unique. Different to the existing KoBo and ODK documentation. e.g.
ODK XForms Specification, Random Numbers for Questionnaire ID,
Form Operators and Functions — ODK Docs
https://community.kobotoolbox.org/t/what-are-the-relation-between-these-columns-you-get-while-exporting-data-in-excel/9523/2
“The UUID once received will never be duplicated i.e. i have received a UUID of e45577db-085d-47a0-b1d4-0d9799077b5a for my submission as shown in the image above. No one else in the internet should receive this UUID again.”

There is also another ODK thread on duplicates here:

https://docs.getodk.org/aggregate-data-access/#publishing
“Under certain failure conditions, the downstream service can receive multiple copies of a given submission. This is known, expected, behavior.
Duplicates typically occur if the downstream service is slow to respond or acknowledge a request. It is your responsibility to detect and eliminate these duplicates should they occur (they will always have exactly the same information in all fields).”

See also
https://forum.odk-x.org/t/is-it-possible-to-alert-the-user-of-the-duplicate-records/1054/2:
“If you are using ODK-X to create the identifiers (the uuid) you would never end up with two of the same uuid, so duplication would be avoided.”

Universally unique identifier - Wikipedia “The probability to find a duplicate within 103 trillion version-4 UUIDs is one in a billion.”

cc @jnm and @Xiphware : Any new info on this, please?

1 Like

Hi all, I’ve got a similar problem as well. As @dianedetoeuf mentioned ~2% (22 out of 1024) of my submitted forms have duplicated _uuids (11 duplicated uuids for 22 submissions). Their _ids are unique, and the submissions are not duplicated -they are genuine submissions.

I don’t know if it helps but here are the answers to the questions that @wroos asked:

  • UUID duplicates are both in table view and excel export.
  • Can’t reproduce it.
  • ID’s are different, but the uuid’s are duplicated.
  • Other than the username of the person, most of the data is different. (got 134 questions which some of them expected to be have same answers, not a duplication issue.)
  • When trying to view the submission, it works as expected, BUT when I try to edit I get an error:
    (this might be due to today’s server issue, will check again when the server is acting a little bit better.)
    (By the way, could there be a relation between this issue and the can’t edit issues recently?)

The data server for your form or the Enketo server is down. Please try again later or contact support@kobotoolbox.org. (500)

  • Forms don’t have any type of media attachment.

More information:

  • All the duplicated UUID’s occur within the relevant account. See screenshot for details:
    (In other words, there is no duplicated UUID between different accounts)
    (Censored usernames and Most of the UUID’s for security reasons.)
    UUID Replication-3

  • All of them are submitted via web.

One of the things I noticed is the submisson time. In @dianedetoeuf’s, @Bernard_26’s and my cases, the submission time of the duplicated UUID’s are so close to each other.

Will test more and update here.
Best,

2 Likes

Hello @Bernard_26,
did you use Enketo (Webforms) when the duplicares problem happened? (Or Collect?)

Hello @hakan_cetinkaya,
thanks for the details and research!
Did I understand well?
The __uuid duplicates only happened with submissions from the same device and username.

  1. Do these users have different project/server accounts?
  2. Are the duplicates from directly sequential cases/submissons from the same user?
  3. Are all cases from a common time slot?
  4. Are there cases/submissions without duplicates (from another or the same device/user) between a pair of duplicates). Regard timestamp and _id?
  5. What about the end timestamp (metadata) of the duplicates, (also in relation to other cases).
  6. Which server did you use?
  7. Which browser(s) did you use?

Someone of the Core Team might explain, please, when exactly is the __uuid generated? (And how a duplicate might happen?)

1 Like

@dianedetoeuf, @wroos, @hakan_cetinkaya, we do have a GitHub issue for this. You should be able to track it through this link:

2 Likes