Hello, everyone, and thank you for your patience. My colleague @nolive and I have been working (and continue to work) this morning to resolve the issue on kf.kobotoolbox.org that has been preventing most imports and exports from completing successfully. This has also delayed the POSTing of most submissions to external REST Services servers.
A brief explanation: requests made by a browser need to return in a relatively short amount of time, but not all tasks can complete that quickly. Some examples include:
- Clicking the export button, where the server needs to acknowledge quickly that the export request was received, but the export itself may take up to half an hour for a large data set.
- Receiving a submission, where the server needs to store the data and acknowledge its receipt immediately, but also needs to send a copy to the appropriate REST Services external servers, which may be slow or require multiple retries.
Our infrastructure currently has one queue for all such tasks, with many worker programs running simultaneously to complete any jobs that appear in that queue. The issue, in this case, was a very large project (over 7 million submissions) with a very slow (or even unresponsive) REST Services external server overwhelming the workers and causing the queue to be filled with over 70,000 tasks. This effectively crowded out everyone else’s tasks, manifesting as the import and export failures that you’ve likely seen if you use kf.kobotoolbox.org.
We are working first to get through the backlog of tasks, which should be done in the next 15-20 minutes (current time is 15:13 UTC on 8 September) [update: this completed on schedule; all service should now be normal]. Following that, we will urgently work to separate the queues so that any future REST Services issue does not impact other tasks like imports and exports. Once that is complete, we will work on a fairness algorithm to make sure one REST Service cannot dominate the queue and prevent all other REST Services from working properly.