I have had KoBo running for quite some time now on our own servers and it has been wonderful. This community and all of the supporting members have been great through the process. I have come across some interesting issues over the past year and hoping someone can help with this one.
Periodically, our servers will “go down.” I actually have three instances running. Prod and dev both in Azure env, and another in AWS lightsail. Each of the three will periodically throw a 502 or 504 error from nginx and the whole site will go down until I restart all the containers using python3 ./kobo-install/run.py The production version goes down far more often and appears it MIGHT be load related. I havent actually tried seeing if I restart nginx container instead of the whole container set to see if that helps. Has anybody come across this on their own servers?
My first lead was that this was somehow load-based. We had a user unknowingly hitting the API with massive simultaneous redundant queries that would end up breaking the server, requiring us to restart the KoBo containers. They would pull ~100,000 records before they (or we) knew that there was a 30,000 row limit. Guessing that was a timeout issue, though an error message from the API could be useful.
Then this happened on the instance in lightsail which has MAYBE 10 users, none of whom know what an API is, so the excessive load theory was put into question.
I saw a post in git about the uWSGI workers and noticed UNOCHA upped their worker number to 24. I dont think we can do this as our VM is only 4 core, but as the default is 2, i decided to up ours to 3 to see what would happen. I think standard practice is 2 workers per core?
Do I really need to set up a load-balancer to address these types of issues?
If anyone has any ideas of how we can avoid the 502/504 errors occurring, that would be very helpful!