Self-hosted KoBoToolbox randomly "shutting down" (504 error)

Hi,

I am creating this post following an issue with a self-hosted KoBoToolBox instance.
While Kobo is running fine, it sometimes ends up in an Error 504: Gateway Time-out for apparently no particular reason.

Kobo has been installed using the “kobo-install” Python script and I cannot find any error in Kobo’s logs. Furthermore, none of Kobo’s Docker containers seems to be down.
After a proper restart of Kobo (still using the Python script from kobo-install), everything works fine again.
This error occurs quite randomly, up to twice a day (although not everyday).

Below is the Kobo configuration I am using (passwords and domains hidden).

╔════════════════════════════════════════════════════════════════════╗
║                                                                    ║
║ Welcome to kobo-install.                                           ║
║                                                                    ║
║ You are going to be asked some questions that will determine how   ║
║ to build the configuration of `KoBoToolBox`.                       ║
║                                                                    ║
║ Some questions already have default values (within brackets).      ║
║ Just press `enter` to accept the default value or enter `-` to     ║
║ remove previously entered value.                                   ║
║ Otherwise choose between choices or type your answer.              ║
║                                                                    ║
╚════════════════════════════════════════════════════════════════════╝
Where do you want to install?
[/root/apps/kobo-docker]:
Please confirm path [/root/apps/kobo-docker]
	1) Yes
	2) No
[1]:
Do you want to see advanced options?
	1) Yes
	2) No
[2]: 1
What kind of installation do you need?
	1) On your workstation
	2) On a server
[2]:
Please choose which network interface you want to use?
	eno1) XX.XXX.XX.XX
	other) Other
[eno1]:
Do you want to use separate servers for front end and back end?
	1) Yes
	2) No
[2]:
Public domain name? [kobo.local]: kobocollect.mydomain.tld
KPI sub domain? [kf]:
KoBoCat sub domain? [kc]:
Enketo Express sub domain name? [ee]:
Do you want to use HTTPS?
	1) Yes
	2) No
[1]:
╔════════════════════════════════════════════════════════════════════╗
║                                                                    ║
║ Please note that certificates must be installed on a reverse-proxy ║
║ or a load balancer.kobo-install can install one, if needed.        ║
║                                                                    ║
╚════════════════════════════════════════════════════════════════════╝
Auto-install HTTPS certificates with Let's Encrypt?
	1) Yes
	2) No - Use my own reverse-proxy/load-balancer
[1]: 2
Is your reverse-proxy/load-balancer installed on this server?
	1) Yes
	2) No
[1]:
Internal port used by reverse proxy?
[8080]: 8442
SMTP server? []: smtp.my-server.tld
SMTP port? [25]: 587
SMTP user? []: no-reply@mydomain.tld
SMTP password []: ************
Use TLS?
	1) Yes
	2) No
[2]:
From email address? [support@kobocollect.mydomain.tld]:
Super user's username? [super_user_username]:
Super user's password? [************]:
Docker Compose prefix? (leave empty for default) []:
Use staging mode?
	1) Yes
	2) No
[2]:
KoBoCat PostgreSQL database name?
[kobocat]:
KPI PostgreSQL database name?
[koboform]:
PostgreSQL user's username?
[kobo]: kobo_username
PostgreSQL user's password?
[************]:
Do you want to tweak PostgreSQL settings?
	1) Yes
	2) No
[2]:
MongoDB root's username?
[root]:
MongoDB root's password?
[************]:
MongoDB user's username?
[kobo]: kobo_username
MongoDB user's password?
[************]:
Do you want to run the Redis containers from this server?
	1) Yes
	2) No
[1]:
Redis password?
[************]:
Max memory (MB) for Redis cache container?
Leave empty for no limits
[]:
Do you want to expose back-end container ports (`PostgreSQL`, `MongoDB`, `Redis`)?
	1) Yes
	2) No
[2]:
Do you want to customize the application secret keys?
	1) Yes
	2) No
[2]:
Do you want to use AWS S3 storage?
	1) Yes
	2) No
[2]:
Google Analytics Identifier []:
Google API Key []:
Do you want to use Sentry?
	1) Yes
	2) No
[2]:
Do you want to tweak uWSGI settings?
	1) Yes
	2) No
[2]:
Do you want to add additional settings to the front-end docker containers?
	1) Yes
	2) No
[2]:
Do you want to add additional settings to the back-end docker containers?
	1) Yes
	2) No
[2]:
Do you want to activate backups?
	1) Yes
	2) No
[2]:

If you have any idea what this error may come from (server requirements, memory or CPU usage, a particular feature that may result in such error, etc.) please let me know! E.g. I left a no-limit memory for Redis in the config, could it be the cause of the problem? (though I didn’t see a particularly high memory usage from this container).

Thank you in advance for your help,

Kind regards,
Boris

1 Like

Following our conversation, as @stephenoduor mentions I ping you on the detailed topic about our problem. @Kal_Lam ; @stephenoduor; @ks_1

Here is the screenshot of the display message we are facing while connecting to Kobotoolbox, Enketo while the problem occurs.

Capture d’écran 2022-05-06 à 10.30.53

This problem can compromise the use of Kobo for the realisation of our project :’(
Thanks in advance for your help !

Does restarting the kobocat container resolve the issue?

1 Like

This has happened to us and we changed our kpi and kc uwsgi.ini
Primarily, we removed the cheaper_config and added lazy_apps = true I had also reported this in github uWSGI hangs sometimes · Issue #3566 · kobotoolbox/kpi · GitHub

Our kobo-docker/uwsgi/kpi_uwsgi.ini has this added

# For fixing uwsgi workers hanging 
lazy-apps = true

# From [Configuring uWSGI for Production Deployment | Tech At Bloomberg](https://www.techatbloomberg.com/blog/configuring-uwsgi-production-deployment/)
# Restart workers after this many seconds
max-worker-lifetime = 3600

single-interpreter = true
need-app = true
# kobocat fails with this
#strict = true
disable-logging = true
log-4xx = true
log-5xx = true

# This is not being set correctly above
listen = 2048

I’m actually running the run.py command from kobo-install to automatically restart all Kobo’s containers.
I’ll try to restart the containers one by one next time it crashes to check if the error comes from a particular container and I’ll keep you posted about it :wink:
Thanks!

1 Like

That sounds great thank you! I’m not quite familiar with uWSGI but I’ll definitely have a look at it as it may explain the crashes we encounter…

1 Like

AFAIK lazy-apps is usually used to avoid delays during app reloads when changing the code.
If your config doesn’t work with strict = true, it means there’s some incorrect config parameter in the ini.

You might also be able to solve the above problem by increasing the number of workers.

1 Like

The config provided by kobo-docker by default doesn’t work with strict=true from what I remember.

Hello @boris,

We’ve noticed in some circumstances that it happens with default uWSGI settings.
It seems that the cheaper algorithm does not work well with one worker to start and max 2. You can try to change the uWSGI settings with advanced options of kobo-install.
Moreover, increase the max RAM use by uWSGI (it’s 128 MB by default). Depending on your RAM size, you can increase it to something between quarter/half of your RAM.

Do you want to tweak uWSGI settings?
	1) Yes
	2) No
[2]: 1
Number of uWSGI workers to start?
[1]: 2
Maximum uWSGI workers?
[2]: 4
Maximum number of requests per worker?
[512]: 1024
Stop spawning workers if uWSGI memory use exceeds this many MB:
[128]: 2048

It should help.

3 Likes

FYI: If you want to avoid putting down all containers and recreating them, you can only restart kobocat with command ./run.py -cf restart kobocat

1 Like

Hi @nolive thanks a lot for your answer and for providing some detailed configuration hints :slight_smile:
I have applied these to the server; let’s see how that behaves in the upcoming days…

Thanks also for the ./run.py -cf restart kobocat command. However I figured out today that restarting manually the Docker container kobofe_kpi_1 was solving the issue (at least, temporarily). Does ./run.py -cf restart kobocat also restart this kpi container?

Finally, do you have any idea where I could find some logs / details about the error encountered? Looking at the Docker containers’ logs did unfortunately not provide me with any relevant information so far :confused:

Anyway I applied the uWSGI configuration you have recommended, I’ll keep you posted!

Regards,
Boris

1 Like

Yes. then worked again

1 Like

No, if you want to restart kpi , you need to run ./run.py -cf restart kpi. You can restart both at once with ./run.py -cf restart kobocat kpi

For the logs, you can have a look at nginx, uwsgi logs in kobo-docker/log/

1 Like

Hi all,

Thank you again for your support and help regarding this issue.

The UWSGI parameters modification seems to help but has not solved the problem so far. While Kobo resulted in a 504 error once (or twice) a day, it has been able to keep alive for the past 4 days. However, the same error just happened again a few minutes ago.

Thank you @nolive for your precisions on the restart command and also for the logs location which I didn’t notice earlier. I have been able to retrieve UWSGI logs which seem to process all requests properly, until shutting down (and respawning?) for no apparent reason:

// [...] Classic requests showing up (GET, POST, etc.), then:
...The work of process 2036 is done. Seeya!
worker 2 killed successfully (pid: 2036)
Respawned uWSGI worker 2 (new pid: 2038)

I have also checked the other files within the kobo-docker/log/kpi folder, without identifying any particular error output.

Also, not related to the 504 error, but when running ./run.py -cf restart kobocat kpi the following error shows up (a full restart with ./run.py solves this problem):
Screenshot 2022-05-16 at 18.17.08

Regarding the 504 error, we are currently restarting manually the KPI docker container every 2-3 days. However we are definitely looking forward to finding a safer/cleaner solution for using Kobo on a long-term basis :grin:

Thanks again for your help!

This is completely expected, as uWSGI respawns workers based on their max-worker-lifetime and max-requests parameters.

1 Like

The System configuration error is another error. It happens when KPI and KoBoCAT are restarted without restarting NGINX. NGINX caches the IP addresses of KPI and KoBoCAT containers and they are restarted, sometimes it happens that docker swaps their IP addresses. Therefore, NGINX caches is unvalid and it’s trying to reach KoBoCAT at KPI address and vice-versa. We are still investigating on this error. (To avoid this, run ./run.py -cf restart kpi kobocat nginx)

As @ks_1 told you, the logs you see are expected. You may have to tweak uWSGI parameters to find the sweet spot for your app depending on your server configuration (e.g. RAM, CPUs, single instance or front-end and back-end containers are running on different servers).

Can you try to deactivate the cheaper algorithm to see it helps?
By default, the timeout settings are 120s. Do you know if any of your requests could last longer than that?

1 Like

Hi all!

Thanks @ks_1 for pointing out that the uWSGI logs are expected; I am not quite familiar with uWSGI.
Thank you also @nolive for your help; the ./run.py -cf restart kpi kobocat nginx command does indeed solve the subsidiary problem.

As mentioned in my previous post I have tweaked the uWSGI parameters as you suggested:

Increasing values seems to delay the problem (which happens every 3-4 days now instead of once a day), although it doesn’t solve it :confused:
We have quite good server specs (Intel Xeon 3.10GHz, 15Go Memory, SSD storage, etc.) as detailed below;

We could therefore push further the uWSGI tweaking but I am afraid it would just delay again the problem, e.g. to ~10 days or so.

Can you try to deactivate the cheaper algorithm to see it helps?

Sorry could you elaborate on that point? I am not sure what you mean by “the cheaper algorithm” :sweat_smile:

By default, the timeout settings are 120s. Do you know if any of your requests could last longer than that?

None of the requests last longer :slight_smile: I have actually encountered errors caused by too heavy/much requests to Kobo’s API but that wasn’t related to the “shutdown” problem we are experiencing every few days.

Thanks again for your precious help,

Kind regards,
Boris

1 Like

Hello @boris,

As I said you can try to deactivate uWSGI cheaper algorithm.
To do so, you have to manually update docker-compose.frontend.override.yml in the kobo-docker folder.
Set KPI_UWSGI_CHEAPER_WORKERS_COUNT and KC_UWSGI_CHEAPER_WORKERS_COUNT to empty strings.

...
- KC_UWSGI_WORKERS_COUNT=4
- KC_UWSGI_CHEAPER_WORKERS_COUNT=
...
- KPI_UWSGI_WORKERS_COUNT=4
- KPI_UWSGI_CHEAPER_WORKERS_COUNT=
...

You can restart KoBoCAT and KPI container with ./run.py -cf up -d --force-recreate kpi kobocat nginx

Do NOT run ./run.py --setup or ./run.py --update. Otherwise, it will overwrite what you just did.

3 Likes

One more thing before deactivating the cheaper algorithm, you can enter KPI and KoBoCAT containers and install uwsgitop (pip install uwsgitop)
Then run uwsgitop :1717 and please report what you see.

3 Likes

Hello,

@nolive thank you for these precisions regarding the cheaper algorithm deactivation. Unfortunately, it didn’t solve the problem.
Even sadder, we’re experiencing this “crash/shutdown” at random intervals again from once a day to once every 3-4 days (i.e. it seems that the frequency of the issue increased).

Below are the outputs of uwsgitop :1717, taken just a few minutes after Kobo crashed:

Same screenshots, after restarting Kobo:

Please let me know if there is any other relevant information that I could share to help solve this issue!

Regards,
Boris