Error caused deletion of some projects; recovery complete

Update

We have recovered what we believe is all data affected by this error, i.e. projects that were lost after 07:00 UTC on 22 January. More specifically, our logs show the first effects of this problem on HHI at 07:28 and on OCHA at 07:32.

Edit: I’ve decommissioned the temporary email address as we’ve had no reports that we missed anything in the restoration completed over a month ago. You may still respond here if you believe you’re affected, although full recovery may not be possible after such an extended period of time.

Original Message

Dear community,

Thank for your making us aware of a serious issue today with the KoBoToolbox platform. We have fixed the issue and no further projects should be impacted. Recovering all missing data is now our top priority, and our technical team is dedicating their full attention to this right now.

  1. The 2.020.52b release at the end of 2020 included a small change intended to fix an issue where attempting to create a new project from an invalid XLSForm upload resulted in an empty “Untitled” form being added to the list of drafts. We were not aware of this change having any negative side-effects.
  2. The 2.021.03 release that was deployed this morning around 07:20 UTC suffered from a server problem that caused many more imports to fail than normally would, including many imports with valid XLSForm. The technical reason for this is: imports are handled by a pool of separate worker processes, and some of them failed to update to the latest version of our code. The previous version of the code expected a database column to exist that had been removed, so the workers running that old version could not complete any imports.
  3. This increased failure rate made it obvious that the change in 2.020.52b definitely did have a serious side-effect: existing projects could be deleted if an attempt to replace them with new XLSForm failed.
  4. Thanks to reports from this community, we identified the problem and deployed a fix (2.021.03a) to stop these deletions from happening at 13:31 UTC today. Some imports were still failing—without any deletion resulting from that—up through 14:41.
  5. We are now in the process of recovering all deleted data from backups. It’s likely this process will take several hours to complete. We will post updates here as we make our way through the work.

Thank you, as always, for your patience and support as we endeavor to provide a useful data-collection tool for those who need it most. As the lead developer of KoBoToolbox, I personally apologize for the stress and disruption caused by this failure.

5 Likes

Thanks for your work rectifying this. Any update on the timeframes for when the data will be recovered?

I think there’s still a solid day’s work ahead of us, so I will guess that the process could be complete by 03:00 UTC on Sunday the 24th.

Here’s the progress so far:

  • Identified all missing projects;
  • Recovered, in a raw format, all missing submissions;
  • Started temporary recovery servers for both HHI and OCHA;
  • Restored the most recent full backups of HHI and OCHA databases to their respective recovery servers;
  • Downloaded from cloud storage, decompressed, and decrypted all incremental database backup files between the latest full backups and the time of the first project’s deletion;
  • Began making a copy of all relevant database backups for safety.

The next steps will involve multiple point-in-time recoveries of the database so that we can restore the most recent state of each form just before it was deleted, as opposed to simply recovering the state of all affected forms before the problem first occurred. Finally, with both forms and submissions restored to the production databases, all projects will function normally again: they will appear in the UI, receive submissions, allow data exports, etc.

Thanks again for your patience!

3 Likes

This work still in progress. We’re staying up through the night (Eastern Time) to get it done.

The 250+ point-in-time recoveries needed for HHI are complete, and the 400+ needed for OCHA are in progress. While OCHA finishes, we will proceed with restoring the recovered data to the HHI production server.

1 Like

We believe that all deleted projects on the HHI server (kobotoolbox.org) have been recovered to KoBoCAT only. They will accept new submissions and appear in the “Projects (legacy)” interface. However, viewing or exporting old data may not work yet as we resynchronize the MongoDB read-only replica used for these tasks.

These projects, as well as undeployed draft forms, will be restored to the main HHI interface shortly.

The OCHA server is still processing point-in-time recoveries.

1 Like

This process is now complete.

1 Like

Thanks for all your work trying to recover the data. My project still isn’t back. Should I be worried?

This process is now complete. Hopefully, all missing projects on the HHI (kobotoolbox.org) instance are now recovered. (@pcranley, please check again now)

While the OCHA server continues to process recovery points, we will perform some additional checks to verify that the HHI restoration is complete.

1 Like

OCHA point-in-time recoveries are complete. Projects will be restored to KoBoCAT first, then synchronized to MongoDB, and then restored to the main interface (KPI, kobo.humanitarianresponse.info).

1 Like

OCHA KoBoCAT restoration complete; MongoDB sync underway.

Your projects will not yet appear in the main interface and submissions may not appear when viewing or exporting from the legacy interface.

1 Like

The MongoDB resynchronization for OCHA is complete. Now restoring to KPI (the main interface of kobo.humanitarianresponse.info).

1 Like

The restoration of all data on OCHA / kobo.humanitarianresponse.info should now be complete. We will continue to run tests verifying the recovery on both OCHA and HHI servers, but you may use KoBoToolbox normally.

If you find that your project is still missing, please respond here. Alternatively, you may email 202103.recovery@kobotoolbox.org, but we’re able to respond more quickly to replies on this forum than to emails. Thank you!

2 Likes

Thank you for your efforts and hard work. My 2 projects are back.

3 Likes

Hello @jnm,
Is all related work, including your verification tests, done already?

Thanks for the hard work of fixing these problems. Must have been a tough few days for y’all.

And more importantly: thanks for keeping kobo.humanitarianresponse.info running smoothly most of the time. Not having to run our own servers saves us a huge amount of time and money. I think we don’t express this enough, so I wanted to take this opportunity to say thanks.

4 Likes

Thank you @Sjlver we appreciate your compliments and most importantly your patience. We will strive to continue providing support to all of you who do critical work within the development sector and the wider research areas.

Stephane

2 Likes

Appreciate the great work!

This kind of thing makes me very nervous, because I’m running Kobo on my own server and I have limited technical skills… I am running an older version of Kobo, and would like to update it to sort out some bugs. And of course avoiding data loss is the topmost priority.

After going through the github docs I didn’t find any update process to ensure that things go smoothly. Is there any documentation anywhere?

@ks_1, you won’t encounter this issue on your own server, but should still be careful of data loss regardless when maintaining your own instance.

Which version of kobo-install are you running? Note that there is a word of caution when upgrading from releases before 2.020.18.

1 Like

@Josh
How can I check the version running on the server? We had installed it back in July '20 and I believe it was the latest version at the time.

Hi @ks_1, I’ve just moved this discussion here so that we don’t clutter the announcement.

3 Likes