500 Error, request urgent assistance

I’m really sorry for the tags @jnm @nolive but this is a critical issue and it’s holding up some really important work; I would appreciate your quick support please.

It’s weird because I had this installation running on my server for months, and it never gave a problem. Until today when I just restarted the containers using run.py --setup. Suddenly, I started getting 500 errors upon trying to log in. It works up to the login page, but after entering the credentials I get this:

Notes:

  • kc is working fine
  • enketo is NOT working (see logs below)
  • APIs for kpi are working fine
  1. Logs from /log/kpi/uwsgi.log show nothing helpful, just the 500 response after the login. As you can see, the api calls return 200.
{address space usage: 421433344 bytes/401MB} {rss usage: 156475392 bytes/149MB} [pid: 194|app: 0|req: 111/416] 178.62.244.159 () {44 vars in 554 bytes} [Mon May  8 01:21:14 2023] GET /api/v2 => generated 0 bytes in 2 msecs (HTTP/1.0 301) 5 headers in 174 bytes (1 switches on core 0)
{address space usage: 410095616 bytes/391MB} {rss usage: 144490496 bytes/137MB} [pid: 195|app: 0|req: 168/417] 178.62.244.159 () {44 vars in 556 bytes} [Mon May  8 01:21:14 2023] GET /api/v2/ => generated 626 bytes in 3 msecs (HTTP/1.0 200) 6 headers in 197 bytes (2 switches on core 0)
{address space usage: 410095616 bytes/391MB} {rss usage: 144490496 bytes/137MB} [pid: 195|app: 0|req: 169/418] 178.62.244.159 () {44 vars in 554 bytes} [Mon May  8 01:21:59 2023] GET /api/v2 => generated 0 bytes in 2 msecs (HTTP/1.0 301) 5 headers in 174 bytes (2 switches on core 0)
{address space usage: 421433344 bytes/401MB} {rss usage: 156475392 bytes/149MB} [pid: 194|app: 0|req: 112/419] 178.62.244.159 () {44 vars in 556 bytes} [Mon May  8 01:21:59 2023] GET /api/v2/ => generated 626 bytes in 5 msecs (HTTP/1.0 200) 6 headers in 197 bytes (1 switches on core 0)
{address space usage: 410095616 bytes/391MB} {rss usage: 144490496 bytes/137MB} [pid: 195|app: 0|req: 170/420] 45.129.56.140 () {64 vars in 1525 bytes} [Mon May  8 01:22:08 2023] GET /accounts/login/ => generated 3216 bytes in 22 msecs (HTTP/1.0 200) 8 headers in 450 bytes (1 switches on core 0)
{address space usage: 410095616 bytes/391MB} {rss usage: 144490496 bytes/137MB} [pid: 195|app: 0|req: 171/421] 45.129.56.140 () {64 vars in 1456 bytes} [Mon May  8 01:22:09 2023] GET /me/ => generated 35 bytes in 2 msecs (HTTP/1.0 200) 6 headers in 203 bytes (1 switches on core 0)
{address space usage: 421433344 bytes/401MB} {rss usage: 156565504 bytes/149MB} [pid: 192|app: 0|req: 52/422] 45.129.56.140 () {64 vars in 1474 bytes} [Mon May  8 01:22:09 2023] GET /environment/ => generated 11365 bytes in 23 msecs (HTTP/1.0 200) 6 headers in 199 bytes (1 switches on core 0)
{address space usage: 410619904 bytes/391MB} {rss usage: 145805312 bytes/139MB} [pid: 195|app: 0|req: 172/423] 45.129.56.140 () {74 vars in 1793 bytes} [Mon May  8 01:22:10 2023] POST /accounts/login/ => generated 27 bytes in 236 msecs (HTTP/1.0 500) 5 headers in 173 bytes (2 switches on core 0)
  1. Logs in log/kpi/celery.log show something interesting:
[2023-05-08 01:00:05,669: ERROR/MainProcess] consumer: Cannot connect to redis://:**@redis-main.<domain>.private:6379/1: Client sent AUTH, but no password is set.
Trying again in 32.00 seconds...

[2023-05-08 01:00:37,703: ERROR/MainProcess] consumer: Cannot connect to redis://:**@redis-main.<domain>.private:6379/1: Client sent AUTH, but no password is set.
Trying again in 32.00 seconds...

[2023-05-08 01:01:09,738: ERROR/MainProcess] consumer: Cannot connect to redis://:**@redis-main.<domain>.private:6379/1: Client sent AUTH, but no password is set.
Trying again in 32.00 seconds...

[2023-05-08 01:01:41,774: ERROR/MainProcess] consumer: Cannot connect to redis://:**@redis-main.<domain>.private:6379/1: Client sent AUTH, but no password is set.
Trying again in 32.00 seconds...

[2023-05-08 01:02:13,807: ERROR/MainProcess] consumer: Cannot connect to redis://:**@redis-main.<domain>.private:6379/1: Client sent AUTH, but no password is set.
Trying again in 32.00 seconds...
  1. Logs inside the enketo container are again interesting:
| Error: Redis connection to redis-cache.<domain>.private:6380 failed - connect ECONNREFUSED 172.19.0.2:6380
enketo_express_1  | 00:20:05 0|enketo  |     at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1144:16) {
enketo_express_1  | 00:20:05 0|enketo  |   errno: 'ECONNREFUSED',
enketo_express_1  | 00:20:05 0|enketo  |   code: 'ECONNREFUSED',
enketo_express_1  | 00:20:05 0|enketo  |   syscall: 'connect',
enketo_express_1  | 00:20:05 0|enketo  |   address: '172.19.0.2',
enketo_express_1  | 00:20:05 0|enketo  |   port: 6380
enketo_express_1  | 00:20:05 0|enketo  | }
enketo_express_1  | 00:20:05 0|enketo  | Worker 19416 sadly passed away. It will be reincarnated.
enketo_express_1  | 00:20:05 0|enketo  | Error: Redis connection to redis-cache.<domain>.private:6380 failed - connect ECONNREFUSED 172.19.0.2:6380
enketo_express_1  | 00:20:05 0|enketo  |     at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1144:16) {
enketo_express_1  | 00:20:05 0|enketo  |   errno: 'ECONNREFUSED',
enketo_express_1  | 00:20:05 0|enketo  |   code: 'ECONNREFUSED',
enketo_express_1  | 00:20:05 0|enketo  |   syscall: 'connect',
enketo_express_1  | 00:20:05 0|enketo  |   address: '172.19.0.2',
enketo_express_1  | 00:20:05 0|enketo  |   port: 6380
enketo_express_1  | 00:20:05 0|enketo  | }
enketo_express_1  | 00:20:05 0|enketo  | Worker 19410 sadly passed away. It will be reincarnated.
enketo_express_1  | 00:20:05 0|enketo  | Worker 1294 ready for duty at port 8005! (environment: production)
enketo_express_1  | 00:20:05 0|enketo  | Error: Redis connection to redis-cache.<domain>.private:6380 failed - connect ECONNREFUSED 172.19.0.2:6380
enketo_express_1  | 00:20:05 0|enketo  |     at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1144:16) {
enketo_express_1  | 00:20:05 0|enketo  |   errno: 'ECONNREFUSED',
enketo_express_1  | 00:20:05 0|enketo  |   code: 'ECONNREFUSED',
enketo_express_1  | 00:20:05 0|enketo  |   syscall: 'connect',
enketo_express_1  | 00:20:05 0|enketo  |   address: '172.19.0.2',
enketo_express_1  | 00:20:05 0|enketo  |   port: 6380
enketo_express_1  | 00:20:05 0|enketo  | }
enketo_express_1  | 00:20:05 0|enketo  | Worker 19424 sadly passed away. It will be reincarnated.
enketo_express_1  | 00:20:05 0|enketo  | Worker 1295 ready for duty at port 8005! (environment: production)
enketo_express_1  | 00:20:06 0|enketo  | Error: Redis connection to redis-cache.<domain>.private:6380 failed - connect ECONNREFUSED 172.19.0.2:6380
enketo_express_1  | 00:20:06 0|enketo  |     at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1144:16) {
enketo_express_1  | 00:20:06 0|enketo  |   errno: 'ECONNREFUSED',
enketo_express_1  | 00:20:06 0|enketo  |   code: 'ECONNREFUSED',
enketo_express_1  | 00:20:06 0|enketo  |   syscall: 'connect',
enketo_express_1  | 00:20:06 0|enketo  |   address: '172.19.0.2',
enketo_express_1  | 00:20:06 0|enketo  |   port: 6380
enketo_express_1  | 00:20:06 0|enketo  | }
enketo_express_1  | 00:20:06 0|enketo  | Worker 19425 sadly passed away. It will be reincarnated.
enketo_express_1  | 00:20:06 0|enketo  | Worker 1296 ready for duty at port 8005! (environment: production)
enketo_express_1  | 00:20:06 0|enketo  | Error: Redis connection to redis-cache.<domain>.private:6380 failed - connect ECONNREFUSED 172.19.0.2:6380
enketo_express_1  | 00:20:06 0|enketo  |     at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1144:16) {
enketo_express_1  | 00:20:06 0|enketo  |   errno: 'ECONNREFUSED',
enketo_express_1  | 00:20:06 0|enketo  |   code: 'ECONNREFUSED',
enketo_express_1  | 00:20:06 0|enketo  |   syscall: 'connect',
enketo_express_1  | 00:20:06 0|enketo  |   address: '172.19.0.2',
enketo_express_1  | 00:20:06 0|enketo  |   port: 6380
enketo_express_1  | 00:20:06 0|enketo  | }
enketo_express_1  | 00:20:06 0|enketo  | Worker 19433 sadly passed away. It will be reincarnated.
enketo_express_1  | 00:20:06 0|enketo  | Worker 1297 ready for duty at port 8005! (environment: production)
enketo_express_1  | 00:20:06 0|enketo  | Error: Redis connection to redis-cache.<domain>.private:6380 failed - connect ECONNREFUSED 172.19.0.2:6380
enketo_express_1  | 00:20:06 0|enketo  |     at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1144:16) {
enketo_express_1  | 00:20:06 0|enketo  |   errno: 'ECONNREFUSED',
enketo_express_1  | 00:20:06 0|enketo  |

What I’ve tried so far:

  1. I tried doing a fresh install on another VM with my version (2.021.45b) by cloning kobo-install and checking out this tag. To my surprise, the new installation also has the same issue.
  2. I tried setting the password inside both the redis_main and redis_cache containers. The celery logs got fixed:
[2023-05-08 01:02:45,859: INFO/MainProcess] Connected to redis://:**@redis-main.<domain>.private:6379/1
[2023-05-08 01:02:45,897: INFO/MainProcess] mingle: searching for neighbors
[2023-05-08 01:02:46,945: INFO/MainProcess] mingle: all alone
[2023-05-08 01:02:46,995: INFO/MainProcess] kpi_main_worker@kpi ready.
[2023-05-08 01:02:47,341: INFO/MainProcess] Received task: kobo.apps.hook.tasks.failures_reports[e6a45019-dda6-4c8c-b23b-1939d96c3a30]
[2023-05-08 01:02:47,558: INFO/ForkPoolWorker-2] Task kobo.apps.hook.tasks.failures_reports[e6a45019-dda6-4c8c-b23b-1939d96c3a30] succeeded in 0.2050587823614478s: True
[2023-05-08 01:05:02,690: INFO/MainProcess] Connected to redis://:**@redis-main.<domain>.private:6379/1
[2023-05-08 01:05:02,713: INFO/MainProcess] mingle: searching for neighbors
[2023-05-08 01:05:03,726: INFO/MainProcess] mingle: all alone
[2023-05-08 01:05:03,758: INFO/MainProcess] kpi_main_worker@kpi ready.
[2023-05-08 01:05:04,237: INFO/MainProcess] Received task: kobo.celery.update_concurrency_from_constance[0d80cf9d-9e0f-4a4d-a8d0-3d8234d6bb09]
[2023-05-08 01:05:04,240: INFO/MainProcess] Received task: kobo.celery.update_concurrency_from_constance[404d5eaa-3eda-45cb-bc70-8403784d19fa]

However the issue still persists. Changing the password config didn’t do anything for enketo and the login still gives 500.

Seeing as how the fresh install also caused issues, I’m now looking for some temporary hack to get it working. I have several enumerators working in the field right now and my validators are unable to check the data. It’s a bit critical, I would appreciate your quick help.

@ks_1 what do you get when you run

ps -ef | grep redis

Issue might be related to

1 Like

Thanks a LOT, @stephenoduor! :handshake::handshake:
You literally saved a ton of our rural work. Our field personnel are in a particularly unreachable spot.

1 Like

@ks_1 :pray:, I am happy that it worked for you

1 Like

@stephenoduor , how do you get redis_main and redis_cache running on the same port (with kobo-install). Are you using a private DNS?

@ks_1 , I’m glad you’ve been able to resolve your issue.

1 Like

Hello @nolive ,

Redis main and redis cache should be running on different ports but due to some reason running on same port causing Enketo loop trying to reconnect . This causes the error and blocks kobotoolbox from starting.

That’s what we are solving in

1 Like