Just an update on my last message: it appears that after deactivating the cheaper algorithm and restarting KoBoCAT + KPI with ./run.py -cf up -d --force-recreate kpi kobocat nginx, the surveys are not loading anymore.
Then, restarting Kobo with ./run.py makes it freeze at the last step when waiting for environment to be ready:
Just want to echo that our setup also crashes very frequently with no useful logs to investigate (like ~10 times/day - our traffic is pretty high).
Evidently some uWSGI tuning might be required, but we donāt really have much expertise to track this down, so we havenāt tried some of these tuning parameters advised above.
The impact is very limited since containers are automatically respawned by our infrastructure (we run on kubernetes), and we run 2 nodes anyway so the traffic is instantly rebalanced, so itās not a huge deal but is still something weād like to fix at some point.
Weāll give it a try and will report back, but a tuning guide for self-hosters might be a useful thing to do.
Are your screenshots with cheaper deactivated or activated?
All screenshots were taken with cheaper activated, as deactivating them resulted in some weird behaviour mentioned in my previous message (i.e. surveys not loading, impossible to restart with ./run.py).
Can you confirm which container needs to be restarted to make the app work again?
Iāll need to wait the next 504 error to confirm that but I think restarting KPI solves the issue (Iām now restarting regularly with ./run.py -cf restart kpi kobocat nginx which does the trick).
Thanks also @yjouanique for your message. I agree that having some tuning guide would be great. Your balanced infrastructure looks to be a great workaround but in our case we would definitely like to resolve the underlying issue before using this kind of architecture to scale Koboās usage rather than preventing random 504 errors
If that can help, Kobo is running on a Debian 11 server behind a Nginx proxy which forwards traffic to the Docker containers.
@nolive do you think having Kobo running on its own dedicated server could help? (I mean, if any sort of conflict between Kobo and our proxy or whatever could be the origin of this issue?).
Otherwise as we are really struggling with the issue, is there any way to āhireā an expert Kobo consultant from to help our team with that specific issue?
I assume that if Kobo runs and scales well everywhere except on a few instances as @yjouaniqueās and mine, maybe the issue is related to our server, configuration, etc.?
do you think having Kobo running on its own dedicated server could help? (I mean, if any sort of conflict between Kobo and our proxy or whatever could be the origin of this issue?).
One of our servers is running exactly the same setup. One nginx proxy which forwards traffic to many kobo-docker/kobo-install containers. Itās using Ubuntu 20.04 but Iām pretty sure it does not matter. It could be because of the host ulimit but I think Ubuntu and Debian share the same default settings.
Just in case, can you show us what your ulimit settings?
ulimit -Sa ## Show soft limit ##
ulimit -Ha ## Show hard limit ##
How many requests do you receive per minute?
BTW, The app should work with cheaper deactivated. Iāll have a look on my side.
@boris , can you manually upgrade uwsgi inside the KPI container? pip install --upgrade uwsgi.
I see that the version is 2.0.18 whereas KoBoCAT container is 2.0.19.1.
Iām still trying on my side to reproduce your problem but it did not happen after few days.
Just updated it on both KPI and KoBoCAT so that they are running under the same version uwsgi-2.0.20.
I will keep you posted if it seems to solve the issue.
@boris, I donāt think it will help. I made some tests on my side and could reproduce the problem even with the latest version of uWSGI.
It seems to be a race condition in Python which makes uWSGI hang.
Somebody opened an issue on GH uWSGI hangs sometimes Ā· Issue #3566 Ā· kobotoolbox/kpi Ā· GitHub. They suggest to use lazy-apps = True. According to my tests, it seems to work with this option enabled.
Unfortunately, the cons of using lazy-apps=True, none of the resources are shared between the workers, so when you have lots of workers, there could be a bad side effects of lots of memory usage.
Weāll try to figure out if we can find a solution to fix this but as a short term workaround, you can try the lazy-apps=True option.
I am glad to hear that you have been able to reproduce the issue! Thatās a great step towards long-term fix I guess
I have added lazy-apps=True in the uwsgi.ini configuration of both KPI and KoBoCat. I will let you know if it seems to solve the issue in a couple of days.
In the meantime, can I ask you if there is a straightforward way to save this configuration parameter for future restarts of Kobo?
For now I basically added this lazy-apps parameter by opening a bash session in both KPI and KoBoCat containers, edited the uwsgi.ini file and finally restarted the containers individually.
Thatās a great step towards long-term fix I guess
Indeed.
In the meantime, can I ask you if there is a straightforward way to save this configuration parameter for future restarts of Kobo?
The easiest solution would be to use the new custom feature of kobo-install.
Make a copy of KPI and KoBoCAT uWSGI ini file (outside of the containers, i.e. on the host drive).
Add docker-compose.frontend.custom.yml in your kobo-docker folder containing the following content.