Problem with index values changing with new submissions

botani · March 4, 2020, 9:28am

Dear All,

I have a problem with the changing of index values when new submissions were sending by enumerators every day during the survey. What should I do with this problem? how I can solve it? Because I have index and parent index. So, please help me with this issue.

Kal_Lam · March 4, 2020, 12:06pm

Hi @botani,

Welcome back to the community! Would you mind sharing us the screen-shot of the issue so that it would be easy for the community to better understand your situation.

Have a great day!

botani · March 4, 2020, 1:38pm

Hi Kal_lam,

Here is the data collection of 29-2-2020 and 1-3-2020. you can see in the screenshot that I downloaded the data in 29th that had an index, then in the next day (1-3-2020), I downloaded the data again. It seems that the value index is changed due to new submissions of the data. I do not want my value index to be changed due to new submissions of new interviews, I want these new interviews to be at the end of the excel file with new index values.

Kal_Lam · March 4, 2020, 2:28pm

Hi @botani,

In this case, you could always use the _id or _uuid which should always remain constant (with a submission that has been submitted to the server). I personally prefer using the _id as its shorter and much more convenient then the _uuid which is a bit longer then the _id.

As a reference, please see the _id and _uuid from my dummy project:

Have a great day!

botani · March 4, 2020, 2:49pm

thanks Kal_Lam,
I know that we can use _id and thanks for reminding me about that. But is there any way to fix or stop the index value from changing when new submission sent ?

stephanealoo · March 4, 2020, 6:52pm

Hi @botani
Unfortunately there is no way to stop the index value from changing.

Stephane

botani · March 7, 2020, 2:57pm

Thanks alot for your reply.

Rusti · March 10, 2020, 10:59am

Hi All,

What is the best way to deal with the changing index field and related tables for ongoing surveys?

If you are downloading data periodically and importing into a database the related table index fields will be changing on each download so you cannot use this in a relational database unless you clear all previous data and import the complete dataset each time (which is not ideal), or assign your own index/parent index values on each download before importing into the database…

Is this correct? or is there a way around this?

Thanks

Kal_Lam · March 10, 2020, 11:03am

Hi @Rusti,

Welcome back to the community! I would advise to use the _id or the _uuid instead of using the _index as the _id and _uuid are constant.

Have a great day!

Rusti · March 10, 2020, 11:27am

Thanks Karl!

I get that the _id or _uuid is constant, but those fields do not pull through into the linked tables (repeats) and in the download data the _Index and _parent index are the linking fields.

Is my form not set up correctly if i am not getting the _id and _uuid field on the linked tables? They are always blank.

Alternatively do i use vlookup to get the the _uuid field manually in excel once downloaded? But this would be using the index/parent_index field which would be changing each time anyway?

Thanks

Kal_Lam · March 10, 2020, 11:31am

Hi @Rusti,

Yes, you got that correct! You should use the _index and the _parent index for merging/linking datasets with repeated groups. So as said earlier if you wish to use these, it’s always safe to use them at the end of the survey (i.e. when the data collection is over).

Have a great day!

Rusti · March 10, 2020, 11:54am

Thanks again Kal,
I am working with resource use surveys were data is collected monthly and imported into a database each month for overview analysis so i need to link the tables each month and ensure they re the correct link.
Is the only way around this then to create my own index field each time to link the related tables. Each month i would need to take the ‘new records’ and create an Index field which is based on the ODK index/parent index but unique to my database. I could probably do this by adding the month/time to the index field.

Any other easier ideas for ongoing survey data collection?

Thanks

Kal_Lam · March 10, 2020, 3:27pm

Hi @Rusti,

Let’s see if the community has any new ideas for this.

Have a great day!

stephanealoo · March 10, 2020, 6:42pm

Hi @Rusti
For starters we conclude with the fact that the index will always change as I had indicated earlier

Now to the issues you raised, I will split them into specific steps:

It would be important to understand the relational nature of your database and understand some few principles around how the data in repeat sits verses your data base. To provide more clarity. In a previous discussion I had indicated the workarounds that allows you to change the data. You will have probably two or more files one for main and the others for every repeat. My approach has been using commands in SPSS where I do the following

Transform the repeat data from long to wide format using case to var command with the parent key as identifier variable in the transformation
I then merge the two sets using primary key in main file and parent key in the wide format of the repeat. I use the add variables command for merge files.

Action needed from you: Share a schematic of your data management flow within the app I look at it.

You do not need to do this; it is cumbersome and this has already been done. I would suggest the following workaround which should be done before you import the data onto your system:

Download your XLS as usual with all the sheets in it.
You will find the _id or _uuid in the main sheet but probably not on the sheets representing the repeat data.
Create the columns for _id or _uuid in the repeat data sheets and use the VLOOKUP in excel to fill these columns using data from the main sheet.
You can make this as a Macro and save it so that you repeat the process every time you download the data.
Build your data management protocol based on this data set. Note the IDs wont change…

Stephane

Rusti · March 11, 2020, 7:03am

Hi Stephane,

Thanks for your input that really helps. I do not have SPSS and that is probably beyond my level at this stage.

Your second approach will work, and is similar to what i was thinking in my badly written post above (sorry), i will lookup the _id values using the index each time and use that as the primary key in the database.

I somehow seemed to have overlooked the changing index field when i started with this but each time i have done a full import so it has not affected my data, going forward i would like to import only the latest data so will follow your suggestion to do this.

Thanks again, the support on this forum is great. Regards

botani · March 11, 2020, 7:58am

Hi Rust and every one,

I think it will be better for Kobo staff to fix this problem. Why we go in a long way if it is possible to solve it by Kobo staff.

Delshad Botani

mealcoopiiraq · June 30, 2021, 7:03am

Hi Kal.

I am facing the same issues. I would need to download the database every week, correct submitted data and include them in a “master” database.

I understand I can’t rely on index number for this, thus my question: is the submission _ID progressive? Meaning that I can order them by “smallest” and expect next weeks submissions to appear “after” the last (largest) _ID that I had in the previous week database?

Kal_Lam · June 30, 2021, 7:52am

Welcome to the community, @mealcoopiiraq! Yes, you got that correct.