Kobocat move_media_to_s3 wrong url for files

Hello,

I am experiencing strange behavior from the uploaded media in Kobo after moving our files to S3.
Self-hosted, currently on:
KC 2.022.08b
KPI 2.022.08a
Enketo 2.8.1

Background: We were running out of disk space on our server and decided to move the media files added to form submissions to S3. We found the management command to move the files to an S3 bucket we set up. All told we had to move about ~100,000 files and the operation appeared to complete successfully (see other thread for note on minor modification to the command to avoid running out of memory during transfer of so many files).

Over the next several days, some users began to notice some of their media files from the submissions were not accessible. I checked the server filesystem and found that for at least one project, ~150 folders did not copy over for some reason (curiosity #1). I was able to compare and move those manually to S3.

Then we began to notice that some media files were not downloadable from the UI at the “View” submission screen under Data. Some files were available only if viewing the submission via the “Edit” view for that submission. Some media files were not accessible in either view despite verifying that they all existed in S3. Then we exported the XLS and including media URLs, and those URLs worked for some files but not for others (also confirmed existed in S3). The URLs in the XLS appear as “…com/media/original?media_file=…” and the subsequent download is named “original” whereas others appear as expected (…com/api/v2/assets/<form_id>/data/222000/attachments/222000/)

I suspect there is some inconsistency with the URLs that are stored in the KC/KPI databases?

#1: Is there a known reason why some folders may not have been transferred to S3 via this command? I will likely run the command again, but it takes some time given the shear number of files.

#2: Is there another management command that updates the stored URLs for the media uploaded with submissions? I was not able to see anything obvious in the codebase. Have you come across issues like this before or have any recommendations on how to get the media files URLs and download links back in sync?

Regards

1 Like

Were you able to solve this issue?

I have not found a good solution. and you are the only response so far :slight_smile:

Hi,
I had faced a similar problem; you can see the Github issues I opened here:

1 Like