Multiple submissions randomly appeared all at once on server days to weeks after submitting

Hey guys,

In one of our active studies (kf.kobotoolbox.org) where we are enrolling 20-40 patients a day and using a kobo enrollment form, a large group of submissions from different days just appeared on the server all at once yesterday. It appears they never made it to the server before yesterday. Looking at the data, the start times for the forms are from the various days the forms were filled within the past few weeks, but all of the end times are yesterday, July 14 (see screenshot below). The submission times for the forms are all the same day of the start time, when the form was filled. It looks like the enumerators submitted them on the correct day and never reported any issues.

Have no idea how this happened and also not good, because we are providing remote home based COVID care to all our patients and now its too late to provide care to many of those patients whose records didnt appear on the server until yesterday.

Probably not likely, but I’m curious if it could possibly be related to the newest relase coming? Release Notes - version 2.022.24. In the notes it mentions the removal of Simserials and the records in our data that just appeared yesterday all of them have “Simserial not found” in the simserial field when none of the other data has that message. @jnm ?

Hi @mike.destaubin. Unlike _submitted_by, which is set by the server, start and end are provided solely by the client. In other words, whatever values Enketo or Collect send are the values that get reported.

I’ve just double checked this by manually creating and posting a submission that has an end date in 2055. Kobo dutifully reports this date in the table view:

The submissions shown in your screenshot, therefore, were either not finalized until today (they could’ve been saved as drafts long ago), or there is a bug with Enketo or Collect [It appears these submissions came from Enketo; see below].

You’re right that simserial and subscriberid are not related to any delay receiving data: removing those fields’ special metadata status simply changed their column sort order in the table view. They used to be shown all the way at the end, on the right-hand side, and now they’re shown wherever they are in the form definition, which is often near the beginning. They aren’t visible in the form builder, but you can download the XLSForm and look for the simserial and subscriberid rows.

Since you mentioned a difference in simserial values coinciding with delayed submissions:

…I did some quick testing and found that the message “simserial not found” appears only when I submit via Enketo. When I submit using Collect, the simserial column is simply blank. We can conclude, then, that your delayed submissions came from Enketo.

If you have more information about a possible Enketo problem after following up with your enumerators, please let us know.

1 Like

Another possible explanation (credit to @tinok) is submission editing. Enketo doesn’t modify the start date when editing a submission, but it does modify the end date. Your submissions could’ve been submitted on June 29 but then edited on July 14. Whether or not a submission has been edited can be determined by looking for the presence of <deprecatedID> in the submission XML. To see that, go to https://kf.kobotoolbox.org/api/v2/assets/[your asset uid]/data/[your submission _id].xml.

Example where I made the original submission at 22:07 EDT (02:07 UTC), began editing at 22:12 EDT, and submitted the edit at 22:13 EDT:

Edit: as requested by @wroos, the _submission_time for this instance (looking at an XLS export instead of the table view, due to a time zone bug) is 2022-07-16 02:07:10. How is it exactly the same as <start>? Because I used the “duplicate” feature to create this submission. See my post below for a comparison of <start>, <end>, and _submission_time in an edited, but not cloned, submission.

1 Like

Hello,
Small add-on. An end date could even be before a start date, if the local date/time on the device was changed back between (manually or automatically) or if the case started with a wrong local datetime setting which was then corrected back (manually or automatically). It can also happen as effect of wrong/changed timezone setting.

We encountered few cases with wrong (delayed) local date/time after a full and long decharging (not in use) of a mobile device with automatic date/time off.

2 Likes

Thank you for the responses @jnm @wroos

Unfortunately can confirm that Enketo has never been used for this project, only Kobocollect. And the data for these records were never edited either, nor was the timezone changed on the tablets in the middle of the project. I can confirm that they never reached the server until July 14, even though most were filled and submitted through kobo collect the last week of June.

Can you explain how the submission time is earlier than the end time if the form was never edited?

I will circle back and look more into this on Monday.

1 Like

It looks like it’s your word against mine, sorry to say:

Did you look at the submission XML to see if deprecatedID was present?

I’m not really sure how to proceed if we cannot entertain the possibility that someone used Enketo for either an edit or a submission in your project.

Collect removed simserial from its code base altogether in 2020: Removed simserial · kobotoolbox/collect@88ae424 · GitHub.

Here’s the Enketo code that produces the simserial not found message:

I went ahead and tried to identify your project. If its name is “Inscripción” and it has an asset UID beginning with aZbWp, then I see 12 submissions there that have start dates in June and end dates in July. All 12 of those also have deprecatedIDs, indicating that they were edited with Enketo.

Please check with your team to find out who did the editing.

For reference, here’s an example from this project. Note that 10:12 in UTC-7 is 1:12 pm in UTC-4, which is where I’m guessing you were when you created the screenshot above.

<aZbWpxxxxxxxxxxxxxxxxx xmlns:jr="http://openrosa.org/javarosa" xmlns:orx="http://openrosa.org/xforms" xmlns:h="http://www.w3.org/1999/xhtml" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:ev="http://www.w3.org/2001/xml-events" xmlns:odk="http://www.opendatakit.org/xforms" id="aZbWpxxxxxxxxxxxxxxxxx" version="81 (2022-06-15 21:18:21)">
          <start>2022-06-28T14:30:42.409-06:00</start>
          <end>2022-07-14T10:12:49.744-07:00</end>
          <today>2022-06-28</today>
          <username>cxxxxxxxxxx_enum</username>
          <simserial>simserial not found</simserial>
          <!-- SNIP -->
          <__version__>vTTPQxxxxxxxxxxxxxxxxx</__version__>
          <_version_>v6oHxxxxxxxxxxxxxxxxxx</_version_>
          <_version__001>vrHgxxxxxxxxxxxxxxxxxx</_version__001>
          <_version__002>vcr4xxxxxxxxxxxxxxxxxx</_version__002>
          <meta>
            <instanceID>uuid:bb9ae8f3-xxxxxxxxxxxxxxxxxxxxxxxxxxx</instanceID>
          <deprecatedID>uuid:4b6cdbba-xxxxxxxxxxxxxxxxxxxxxxxxxxx</deprecatedID></meta>
        <formhub><uuid>7d2exxxxxxxxxxxxxxxxxxxxxxxxxxxx</uuid></formhub></aZbWpxxxxxxxxxxxxxxxxx>

Edit: as requested by @wroos, the _submission_time for this instance (looking at an XLS export instead of the table view, due to a time zone bug) is 2022-06-29T00:56:10. That is a UTC timestamp, and it tells us that about 4h25m elapsed between the enumerator first beginning the submission and the receipt of the completed submission by the server.

1 Like

Thanks for the details @jnm,
could we also get the _submission_time for this case, please? And those data for the deprecatedID case?

It is also visible in the screenshot above that start and end came from different timezones (local device and server). - Text updated.

How are these dates/times set if a case is uploaded manually? (saved case? and finalised case?)

Kind regards

No, no. This never comes from the server unless the server itself creates the submission. So far, the only way that happens is when someone clicks the “DUPLICATE” button to clone a submission.

Apart from submission cloning, the explanation for time zone differences is that the submitting client was in one time zone while the editing client was in another.

It doesn’t matter how the submission is uploaded. The timestamps are read from <start> and <end> in the submission XML. The question about saved drafts vs. finalized submissions is a good one. I don’t know the answer—it’s totally dependent on client behavior and may vary between Collect and Enketo—but if you end up testing it, I’d be curious to know the results.

_submission_time is not present in the XML sent by the client. It is added by the server, like _id, when the submission is first created [1], [2], and is not updated by submission editing. I’ll edit my posts above to include _submission_time since it seems like a helpful reference.

Also, hey, you found a bug! :tada: The table view incorrectly considers _submission_time to be local time instead of UTC.

1 Like

Hello @jnm,
sorry, not clear referenced:
It is also visible in the given screenshot above that start and end came from different time zones (local device and server).
image
I just wanted to underline: So, this also indicates that there was an editing of this case.

With the new default mode (for numbers and dates) of data export (not Legacy) it seems that the xlsx cuts off the time zones, see your GitHub link.

Could you explain, where you can see the first beginning of the submission, please? Do you mean the start datetime here? Is there an internal form datetime stored when the user ticked submit the case?

Hint:
If the KoBo export converts start and end to local timezone (cutting off the zone suffix), we may get different downloaded data: 2 downloads in different time zones, e.g. with international projects.
The “end” after edit on server level will often have a different timezone towards start, as in your edit example.
Also, the table view shows different formatting at the moment, e.g. with French configuration
start / end: Jul 5, 2022 2:23 PM (but in modal view: 2022-04-19T20:33:19.115+02:00)
_submission_time: 2022-07-05T12:23:24 (even not shown in modal view).
.
Maybe it would be preferable to get the timezone suffix (back) in the xlsx export/download (as it was in the past) and in the table view. And have the same format for all 3 in the view modes (table and modal view). But a disadvantage would be that Excel will not show it as date.

Alas, I concede that it is “server” time because I made the regrettable mistake of using the duplicate (clone) submission feature in my attempts to try to debug this support case efficiently. I fear that describing <start> (or <end>) as “server” time really does nothing to help the original poster resolve his issue and will make for confusing reference material when encountered by future viewers. I must reiterate that these times always come from the client making the submission, even in the odd case (cloning) where the client is the server itself :face_with_spiral_eyes: If you have further questions about this, please be so kind as to open a separate topic.

Yes. Maybe I’m wrong, though. I haven’t tested all scenarios (drafts, etc.) in all clients.

No, there is nothing internal or hidden. In most cases, this is what <end> contains. Just from testing for this discussion, I know that:

  • For a simple submission, done in one sitting without saving or loading any drafts, <end> reflects the time I click “Submit”.
  • For an edit, as we’ve seen, <start> remains unchanged, but <end> gets set to the time when I click “Submit” on the edit.

For other scenarios, though, as I wrote earlier:

Next question:

XLS exports do not convert timestamps to different time zones. They do cut off the time zone offsets, sadly, when “Store date and number responses as text” is not checked—for the reason you mentioned elsewhere: “a disadvantage [to including the offset] would be that Excel will not show it as date.” The “Store date and number responses as text” option is available to cope with this limitation: it brings back the offsets. I’ve included an example below that illustrates how +00:00 and -04:00 offsets are truncated but not converted:

I could see it being helpful that the submission modal, being a detail view, shows the verbatim response including the time zone offset, while the table view—meant for viewing many records simultaneously—standardizes them to a single time zone (local time). I agree with you that it’d be better if the table view and XLS exports showed the same thing, but I’m not sure which approach is the best. I’d be tempted to convert everything to UTC, and then possibly provide options for viewing and exporting in local time. I’ve opened a new issue: How should time zones be handled? · Issue #3943 · kobotoolbox/kpi · GitHub.

I noticed that as well and made an issue for it before my last post: Reduce discrepancies between table view and single submission modal · Issue #3942 · kobotoolbox/kpi · GitHub.

1 Like

Ok sorry @jnm, I should’ve been more specific - there is no way anyone filled out an Enketo form during data collection, because our enumerators have never been given an Enketo link to use, they’ve only been trained to use Kobocollect. However, it’s definitely the case that from the project page online, we use Enketo sometimes to edit the forms if theres been a data mistake reported or identified.

At the end of each day, we check all the new enrollments so we can start providing them home based care the following day. If these forms were edited online after being submitted, it would’ve had to been immediately the same day they were submitted, otherwise we would’ve seen them in the server that day and we would have follow-up data on them starting the next.

It seems like it’s an Enketo error that prevented a batch of edited forms from multiple different days from reaching the sever until July 14, because even if they were opened from the project page to be edited and then the edits were never submitted until July 14, we still wouldve seen the original submissions in the server.

Hello @mike.destaubin,
maybe let us go to forensic now?
I think it might be helpful if you could provide for some cases the data in long full format for time zone: start, end, submission time. We can get this e.g. by XLSX download with as text option.

  • If we find start and end with different time zone, this indicates edit on server. (Otherwise on the enumerator device the time zone has been changed after start).
  • Similar if we find end datetime after_submission_time this woulf also indicate later editing. (Similar exception as above: Editor/Browser locally has bad datetime setting).

Maybe there is also an option to download as XML. So we could also check for deprecatedID im the cases.

1 Like

@jnm @wroos

I have the issue figured out and it’s not really an Enketo to server issue.

We have a REST service setup to send submissions to a dropbox folder as back-up. I just went through the JSON files and found some of the records in question. Their end times were correct and proved they were submitted and showed up on the server correctly on the right day.

What happened is our unique patient ID that is generated in the form relies on two date and time questions. When the forms were edited on July 14, the time question must’ve automatically updated in the Enketo form and then changed the persons unique ID. The date question did not auto update, just the time question.

On our end, we monitor the data through a power bi dashboard and after someone on our team made the edits, it appeard on our end that they were new enrollments, but from days/weeks ago with no follow-up data. It turns out we do have all the follow-up data on these records, but under a different unique patient ID.

It’s a bit frustrating Enketo does that, but all in all good to know we actually did follow up with those patients. I know theres a secondary option to edit the data without opening Enketo and will probably recommend our team always use that option instead.

Thanks for your attention and time on this, apologies it’s not the issue I originally thought it was.

3 Likes

Thanks @mike.destaubin,
Other options might be to work with stable date/time elements, using start, today (metadata) or once( now()) with read-only. To avoid the re-calculation.

2 Likes

Hi @mike.destaubin, I’m glad you figured it out! What I originally wrote no longer applies to your case, but I’ll post it anyway with the hope that it may be a good reference for future issues.

Assuming that “Hora” (whose question name is time, right?) is a regular time question—and it’s not metadata if it’s showing up in the form builder like that—its response should never change automatically during an edit. Regular questions like this are totally distinct from <start> and <end>. I would bet that any unexpected change is related to the time zone of the original submission being different from the time zone of the edit. Perhaps it’d be best to create a new topic, though, since this one is so long already (and is mostly addressing a separate issue).

Original response

Okay, thanks. This makes sense.

Are you saying, then, that these problematic submissions look like completely new enrollments that nobody ever saw before? In other words, the submission content is too new or unfamiliar to be the product of an edit alone?

I will look into the edit history for these particular submissions. [no longer necessary] It’s a strange situation because Enketo’s offline functionality isn’t supposed to be involved with editing at all: if you’re editing a submission and close the tab before your edits are sent to the server, your edits are gone. They’re not queued somewhere to be re-sent later.

It’s also, as you know, impossible to edit something that hasn’t been received by the server. The original submission first has to appear in the data view before anyone can begin editing it. Also, the original submission does not disappear or otherwise enter some kind of suspended state while being edited: it remains visible in the data view all along.

1 Like

So, they have been EDITED.
Which explains the story and the screenshot.

Kind regards

1 Like

Yea thanks, I thought by using questions we created and not the metadata it would be more stable since the metadata can change during edits.

1 Like

Yes that’s correct and why we chose not to rely on metadata when initially building the forms. I can create a new topic for this seperate issue.

2 Likes

Yes, it appears they were edited on July 14. Still tracking down who did the edits and what they were.

1 Like