Data split across multiple columns due to multi-form versions — best practices for migration and prevention?

I manage a patient identification form for an NGO operating in Jordan and Gaza. The form is actively used by field staff for real data collection while I simultaneously make iterative changes requested by management — fixing constraints, adjusting group structures, adding fields, etc.

Over time, this workflow resulted in 89 deployed versions of the same form. When I export the data, fields that were modified across versions appear as duplicate columns (e.g., “Age | Age”, “Gender | Gender | Gender”, “Beneficiary - Country Code × 3”), with submissions scattered across these columns depending on which version was active at the time of entry.

To get a complete export, I must check “Include fields from all 89 versions”, which produces an unmanageable dataset for analysis — duplicate columns, inconsistent naming, and blank cells where a field didn’t exist yet in earlier versions.

Questions:

  1. Is there a supported way to migrate existing submissions from the fragmented project into a freshly cloned clean project (via the API or any other method), after consolidating the duplicate columns externally?

  2. What is the recommended workflow when a form is actively collecting live data while the form designer still needs to make ongoing structural changes? Is a DEV/LIVE project split the standard practice in the community?

  3. Are there specific types of changes (e.g., relabeling vs. renaming fields) that are safe to deploy on a live project without causing column fragmentation?

Any experience, scripts, or documented workflows from the community would be greatly appreciated.

form-versioning data-export column-duplication submission-migration xlsform best-practices

Welcome to the community, @Gharaibeh! Let me try responding to your queries:

Maybe try linking your KoboToolbox data with Excel (by following this support article Connecting KoboToolbox to Microsoft Excel) and try manipulating your data as need there in a different sheet.

You should never change/edit/update the variable name once you deploy your project and your team are collecting data in the field. You could update the label but it’s always advised not to touch the variable name. That distorts the data that is stored in your project.

Please refer to the response provided above for this too.

Keeping this topic open for our valued members to chime in and share their knowledge on managing scenarios like yours.

1 Like