We have recovered what we believe is all data affected by this error, i.e. projects that were lost after 07:00 UTC on 22 January. More specifically, our logs show the first effects of this problem on HHI at 07:28 and on OCHA at 07:32.
Edit: I’ve decommissioned the temporary email address as we’ve had no reports that we missed anything in the restoration completed over a month ago. You may still respond here if you believe you’re affected, although full recovery may not be possible after such an extended period of time.
Thank for your making us aware of a serious issue today with the KoBoToolbox platform. We have fixed the issue and no further projects should be impacted. Recovering all missing data is now our top priority, and our technical team is dedicating their full attention to this right now.
The 2.020.52b release at the end of 2020 included a small change intended to fix an issue where attempting to create a new project from an invalid XLSForm upload resulted in an empty “Untitled” form being added to the list of drafts. We were not aware of this change having any negative side-effects.
The 2.021.03 release that was deployed this morning around 07:20 UTC suffered from a server problem that caused many more imports to fail than normally would, including many imports with valid XLSForm. The technical reason for this is: imports are handled by a pool of separate worker processes, and some of them failed to update to the latest version of our code. The previous version of the code expected a database column to exist that had been removed, so the workers running that old version could not complete any imports.
This increased failure rate made it obvious that the change in 2.020.52b definitely did have a serious side-effect: existing projects could be deleted if an attempt to replace them with new XLSForm failed.
Thanks to reports from this community, we identified the problem and deployed a fix (2.021.03a) to stop these deletions from happening at 13:31 UTC today. Some imports were still failing—without any deletion resulting from that—up through 14:41.
We are now in the process of recovering all deleted data from backups. It’s likely this process will take several hours to complete. We will post updates here as we make our way through the work.
Thank you, as always, for your patience and support as we endeavor to provide a useful data-collection tool for those who need it most. As the lead developer of KoBoToolbox, I personally apologize for the stress and disruption caused by this failure.
I think there’s still a solid day’s work ahead of us, so I will guess that the process could be complete by 03:00 UTC on Sunday the 24th.
Here’s the progress so far:
Identified all missing projects;
Recovered, in a raw format, all missing submissions;
Started temporary recovery servers for both HHI and OCHA;
Restored the most recent full backups of HHI and OCHA databases to their respective recovery servers;
Downloaded from cloud storage, decompressed, and decrypted all incremental database backup files between the latest full backups and the time of the first project’s deletion;
Began making a copy of all relevant database backups for safety.
The next steps will involve multiple point-in-time recoveries of the database so that we can restore the most recent state of each form just before it was deleted, as opposed to simply recovering the state of all affected forms before the problem first occurred. Finally, with both forms and submissions restored to the production databases, all projects will function normally again: they will appear in the UI, receive submissions, allow data exports, etc.
This work still in progress. We’re staying up through the night (Eastern Time) to get it done.
The 250+ point-in-time recoveries needed for HHI are complete, and the 400+ needed for OCHA are in progress. While OCHA finishes, we will proceed with restoring the recovered data to the HHI production server.
We believe that all deleted projects on the HHI server (kobotoolbox.org) have been recovered to KoBoCAT only. They will accept new submissions and appear in the “Projects (legacy)” interface. However, viewing or exporting old data may not work yet as we resynchronize the MongoDB read-only replica used for these tasks.
These projects, as well as undeployed draft forms, will be restored to the main HHI interface shortly.
The OCHA server is still processing point-in-time recoveries.
The restoration of all data on OCHA / kobo.humanitarianresponse.info should now be complete. We will continue to run tests verifying the recovery on both OCHA and HHI servers, but you may use KoBoToolbox normally.
If you find that your project is still missing, please respond here. Alternatively, you may email email@example.com, but we’re able to respond more quickly to replies on this forum than to emails. Thank you!
Thanks for the hard work of fixing these problems. Must have been a tough few days for y’all.
And more importantly: thanks for keeping kobo.humanitarianresponse.info running smoothly most of the time. Not having to run our own servers saves us a huge amount of time and money. I think we don’t express this enough, so I wanted to take this opportunity to say thanks.
Thank you @Sjlver we appreciate your compliments and most importantly your patience. We will strive to continue providing support to all of you who do critical work within the development sector and the wider research areas.
This kind of thing makes me very nervous, because I’m running Kobo on my own server and I have limited technical skills… I am running an older version of Kobo, and would like to update it to sort out some bugs. And of course avoiding data loss is the topmost priority.
After going through the github docs I didn’t find any update process to ensure that things go smoothly. Is there any documentation anywhere?