Issue with Disappearing Data in Exported Raw Data Files

Greetings,

Description

I am experiencing a bug where certain submissions dating back to 2023 have their data disappear when extracted as an Excel raw data file. However, the data is still visible when editing the submission under Data > Table > Edit (pen icon) located next to the submission in question. The missing information within the exported raw data file mainly pertains to repeat questions.

Steps to Reproduce

  1. Navigate to Data > Table.
  2. Select a submission and click the Edit (pen icon).
  3. Note that the data appears correctly.
  4. Export the data as an Excel raw data file.
  5. Observe that the data for repeat questions is missing in the exported file.

Expected Behavior

I expected all data, including repeat questions, to be visible in the exported raw data Excel file, as it appears when editing the submission in the Data Table.

Actual Behavior

The data for repeat questions is missing in the exported raw data Excel file, despite being visible when editing the submission in the Data Table.

Additional Details

  • Server: EU
  • Username: xxxxxxx

Example Projects and Submission IDs

Project #1: Community-Based Nutrition Screening Intake Form - إستِمَارَة تَسْجِيل مَسْحِ الْحَالَةِ الْغِذَائِيَّةِ فِي الْمُجْتَمَعِ

  • Submission ID ; UUID Examples:
    • 430817419 ; 3a82f98f-76e0-4dfa-afaf-043748217f46
    • 480626846 ; 20191bd3-1614-4bbe-bd73-5b7ed0587e85

Project #2: Community Health Campaign Intake Form - إستمارة تسجيل في حَمْلَة التوعية الصِحِّيَّة المُجْتَمَعِيَّة

  • Submission ID ; UUID Examples:
    • 408129951 ; b12a6e24-8ede-483b-91cc-e810181b3a5c
    • 449147405 ; 622e20fc-0635-40c0-a7bb-d4e940151896

Project #3: Boys and Men Family Planning Session Intake Form - إستمارَة تَسْجِيل جَلْسَة ِتَنْظِيم الأُسْرَة لِلشَّبَاب والرِّجالِ

  • Submission ID ; UUID Examples:
    • 440151046 ; 30b00f5d-ed5e-4662-a51c-f319b4936d9d
    • 472427703 ; d576d931-cad6-4406-83c8-ef2a6275aadd
    • 426746419 ; a8f32a45-96e7-45d6-b469-d23c145c820f
    • 472524212 ; b73b6bef-c199-417a-861d-b20ed28a57bc
    • 453261345 ; 54f4a416-ebab-4681-9c54-47a4746ab90e
    • 407357719 ; 191e5360-13cc-499d-b5fc-87fb859d808e

I have tried re-exporting the data multiple times and using different browsers and Excel versions, but the issue persists. Your assistance in resolving this matter is highly appreciated.

@Kal_Lam

Hi,
I trust you know that data for repeats are exported in separate Excel sheets in the same file. See Exporting and Downloading Your Data — KoboToolbox documentation.

Can you also share your settings/options for the data export, please (e.g. include all versions?)

Do the submissions with missing data have common characteristics, e.g. date, enumerator/device?

Did you export these data without priblems before (when)?

1 Like

Hi @wroos

Thank you for your response.

To address your points:

  1. Separate Sheets for Repeats: Yes, I am aware that data for repeat questions are exported in separate sheets within the same Excel file. I have been working with KoBoToolbox for almost three years and have consistently managed repeat data through over a million exports without issues until now, particularly with submissions made prior to 2024.

  2. Export Settings: Please find my export settings in the attached screenshot. I have configured the export to include fields from all versions and selected all questions. These settings have been in place since early 2023 when the forms were first deployed.

  1. Missing Data Characteristics: The submissions with missing data are non-specific, including demographic and intake data of beneficiaries, such as name, sex, nationality, birthdate, among other select_one and select_multiple questions. The missing data are not linked to a specific enumerator or device, as the forms are used by the entire field team across various governorates.

  2. Previous Exports: Yes, I have exported these data on a weekly basis without any issues from early 2023 until recently. The data used to appear correctly in the exported files and were also accessible through API connections in Power Query on Microsoft Excel and Microsoft Power BI which refreshes data every 2 hrs. in real-time. This issue started occurring in the last month or two during my recent data weekly export, as part of my routine data cleaning.

Additional Details:

  • As part of my routine data cleaning process, I export the raw data Excel file weekly and send it to the field team for cleaning. The API connection refreshes data every 2 hours in real-time. The team edits the data as needed, but upon re-exporting, the data disappears, although it remains visible in the Data Table when attempting to edit the submission by pressing on the pen icon. To retain the data, I have had to ask the team to delete and re-enter the submissions, which is time-consuming. However, I kept some submissions, as shared above with their IDs and UUIDs, as examples, should you wish to inspect them for further investigation. Other similar cases tend to reappear on rolling basis as the team attempts to edit old submissions for retrospective data correction purposes.

  • When I sent out the exported raw data Excel files before 2023, I did not experience any missing data when the field team edited their submissions to correct data entry errors. This issue started to occur recently in the last one or two months, particularly with submissions that predate back to 2023. These issues do not occur with submissions from 2024.

Given this situation, I am reaching out to flag this potential bug and seek a solution to mitigate the problem. Your assistance in resolving this issue is highly appreciated.

@omarmakki, is this an issue for only the selected cases that you have outlined above? Could you also see if:

  • … these submissions were edited at some point in time?
  • … the questions do not have mandatory responses checked.
  • … the submissions were forcefully submitted to the server?

If these are the cases then, the submission could miss records for the questions.

Thanks for the detailed information.
In addition to Kal_Lam.
- Have you used different form versions for collecting this dataset?
- Did you change your API programme (since 2023)?
- Can you reproduce any data loss if you edit directly on server level (instead of using the API)?
- Could you check the current form successfully with the Online validator, please?

Hi @Kal_Lam & @wroos,

Thank you for your responses and the follow-up questions. Here are the answers:

  1. Issue with Only Selected Cases:

    • The issue is not isolated to the selected cases outlined above; it has been observed across multiple submissions that were made before 2024, especially when they are edited by the field team for data correction purposes within the last 1 to 2 months.
  2. Edited Submissions:

    • Yes, these submissions were edited at various points in time as part of our routine data cleaning process. We never had an issue like this before when editing already submitted forms until 1 to 2 months ago.
  3. Mandatory Responses:

    • All questions in these forms are mandatory. Thus, a form is never submitted without ensuring that all fields and questions are properly filled out.
  4. Forcefully Submitted Submissions:

    • There is no indication that the submissions were forcefully submitted to the server. The submission process has remained consistent throughout. The field team does not have the capabilities or knowledge to do such a thing as they have limited administrative privileges which merely include filling out a form, viewing it, and editing it.
  5. Different Form Versions:

    • Yes, we have used different form versions for collecting this dataset. The forms have been updated occasionally to improve data collection by incorporating certain constraints to reduce data cleaning. Thus, the main skeleton and structure of the coded XLSForm is similar across all versions, with varying constraints and certain choices being added for select_one and/or select_multiple questions. The export settings have remained the same by including all versions of the form and ensuring that all questions/columns/fields are selected.
  6. Changes to API Program:

    • The API program used for connecting to Power Query and Microsoft Power BI has not been changed since 2023. The API connection refreshes data every 2 hours in real-time. However, submissions made prior to 2024, when edited by the field team in the last 1 to 2 months for data entry error correction purposes, have had their data missing in both the exported raw data Excel file as well as when viewed on Excel through Power Query and Microsoft Power BI via the API connection.
  7. Data Loss When Editing Directly on Server:

    • The field team and I perform our edits to submissions made prior to 2024 directly on the server (i.e., via the Data Table). Once that is done, some data in the submission ends up being missed when re-exporting the raw data Excel file, despite still being visible when I view the submission in the Data Table or attempt to edit it by clicking on the pen icon and being redirected to a new webpage which allows me to view/edit all responses of that submission.
  8. Online Validator Check:

    • Before I deploy any XLSForm I code, it is my habit to validate it through the ODK XLSForm Validator to ensure that the code works properly without bugs or errors. It is one of my usual best practices. So, to answer your question, yes, the current version of the XLSForm successfully validates through the ODK XLSForm validator check.

Given this additional information, I hope it helps in diagnosing the issue.

Thank you for your assistance.

1 Like

Could you share, please;

  • a related extract of your form (in XLSForm format)
  • screenshot examples for the missing data issue (incl. table view and Excel).

My guess is that the problem may be related to form changes in the last months.

You may have a look at the form version history. Can you try to reproduce and better locate the issue, please, through cloning the form/project, entering and editing data, stepwise for your sequential updates/versions in the critical period?

Hi @wroos ,

Thank you for your response and suggestions. I would like to clarify some points based on my observations and provide the requested information.

Observations:

  1. Issue Specificity: The problem only occurs when a submission that was submitted before 2024 is edited in recent times (i.e., 1 to 2 months from now), leading to missing data. This has not been the case when a submission is edited before then. Moreover, when editing a submission on the server through the Data Table by clicking the pen icon, I am always redirected to the latest version of the form, despite the submission being made in 2023 using an earlier version. This is how KoBoToolbox appears to work, applying the latest form version during edits.

  2. Form Versions: The multiple versions of the project form are mainly due to the addition of new Community Health Worker (CHW) names to the choice list, which happens approximately every month. Thus, these updates do not involve structural changes to the form but merely add choices to select_one and select_multiple questions as explained previously.

  3. Consistency Across Projects: This issue has been observed across multiple projects, not just one, and these projects do not have as many versions. The example projects and Submission IDs - UUIDs shared in my initial post reflect this.

Next Steps:
To help diagnose the issue further, I will provide:

  • A related extract of my form in XLSForm format.
  • Screenshot examples illustrating the missing data issue, including the table view, the raw data in Excel, and the edit view of the submission showing the presence of the data.

Attachments:

  1. XLSForm Extract: CH FP for Boys & Men Intake Form (source code).xlsx (114.9 KB)

  2. Screenshots for “Project #3: Boys and Men Family Planning Session Intake Form - إستمارَة تَسْجِيل جَلْسَة ِتَنْظِيم الأُسْرَة لِلشَّبَاب والرِّجالِ”, using Submission ID #440151046:

Given this information, I believe the issue may not be related to recent form changes but rather how KoBoToolbox handles edits to older submissions using the latest form version.

Thank you @wroos & @Kal_Lam for your continued assistance. I look forward to your insights.

Would you mind to test this, please, using clone form versions from the history list, saving them, starting with one version, enter data, upload later version, re-edit the data, etc.?

I remember this is an old bug. Surprised that it hasn’t been fixed yet.

@omarmakki can you try disabling the include media URLs option? If I remember correctly this solved the problem.