Introducing kobo2stata (new on SSC)

Great. FYI and as mentioned in the help file, the 32 character limit on variable names I mentioned above is slightly stricter (29-30 characters) on select_multiple variables, due to the need to add a prefix/suffix on the derived variables - perhaps this had something to do with it. In any case, glad to hear you managed to label your dataset, and sorry to hear a bit of manual work was required.

1 Like

Hi Felix,
Thanks for this update.
I will send you the exact error message once i review this data sets again.
Thanks
Jackie.

1 Like

Good day,

I’m very happy to use this module. It makes my job of creating do-files very easy.
I’d like to propose one useful feature. Usually we use questionnaire on several languages (2-3 or even 4 label columns in XLSForm).

Kobo2stata has options surveylabel and choiceslabel that may use one parameter only, or one language.

I know that variables and values in Stata may contain several labels for several languages.

[D] label language – Labels for variables and values in multiple languages

It would be nice if Kobo2stata may compile several label columns in dataset.

For example:

kobo2stata using “C:/mydata/kobosurveydata.xlsx”, xlsform(“C:/mydata/aDyQEvcRVs9re5L.xls”) surveylabel(“Label::English” “Label::Uzbek” “Label::Russian”) choiceslabel(“Label::English” “Label::Uzbek” “Label::Russian”)

I would be appreciated if you realize this option or if you advise how reach this with help of Kobo2stata.

Best wishes and good luck with everything.
Please excuse me for my poor English.

1 Like

Hi Sergey,
Many thanks for using kobo2stata, and for the interesting suggestion. This would indeed be a nice added feature, although it would require a major rewrite of the command and I’m afraid I can’t find the time right now.
As a workaround, you might be able to run kobo2stata three times on the same dataset, once for each label language. Then specify the label language in Stata for each of the three resulting Stata datasets, using the above-mentioned “label language” command, and save. Finally, run a “merge 1:1 _all” to bring the three back together into a single dataset.
Please note that preserving labels when merging multilingual datasets in Stata is a bit tricky, and I haven’t verified that the above can indeed work. Here’s a useful article on the topic: https://journals.sagepub.com/doi/pdf/10.1177/1536867X1001000113
Best,
Felix

1 Like

I have data in KOBOCollect and want to use STATA to analyze my data. I have downloaded an excel sheet of the data and converted to .csv type to import into STATA. The select_multiple have helpfully been dealt with through the generation of additional numeric variables for each possible choice .

I am struggling with how to deal with select_one type questions. The variable in STATA is coded as a string based on the ‘name’ I generated in the ‘choices’ page. I used the KOBO2STATA code to generate my dataset - but this only helps with the ‘name’ that I gave the ‘choices’ was numeric. Many of my ‘names’ for my ‘choices’ are text as I wasn’t aware a numeric value was needed for KOBO2STATA. Please advise what the best course of action is to deal with these variables as a group to make them numeric instead of re-formatting each variable one by one.

I have briefly looked at ODKmeta but I am not quite sure how to use this on data files from KOBO, or if it is even possible.

Thank you

Welcome back to the community, @richards_8. Have moved your post here seeing the discussion would be more relevant here.

Stata’s “encode” command should do the trick.
See e.g. How can I convert string variables to numeric variables in Stata? | Stata FAQ

1 Like

Yes I am aware about using encode - I was hoping there was a way to do it to a group of variables instead of applying encode to each individual variable

Does ODKmeta offer this?

Hello,
I am getting the error “label ambiguous abbreviation” but have checked, and the label in my command matches the one in the label column headers in the form excel.

Also, I understand that the program will only understand numeric values in the “name” column of the XLSForm, but my original data was not coded that way. I manually replaced the names in the excel with numbers - is that okay?

Hi Rachael,

Thanks for getting in touch.

On non-numeric values, you would need to systematically replace any string values with numbers in both the xlsform and the dataset, in order for kobo2stata to match them across these two input files. This may or may not be feasible in your case, but given the amount of manual work usually involved in this (and how prone to errors it will be) I cannot recommend it. As a fallback, you can still run your files through kobo2stata to apply at least the variable labels (but not value labels), and then run your categorical string variables through a crude “encode” in Stata for some rudimentary value labels.

On the label-related error message, first please make sure you are using the latest version of kobo2stata (as of today, this is v1.06 - see the version number at the very bottom of the kobo2stata help file). If you have an outdated version, please run Stata’s “adoupdate” command. If you do have the latest version of kobo2stata and the problem persists, you might be using non-standard notation in your label column headers, as a result of which kobo2stata can have troubles identifying the correct label language in a multi-lingual setup. Try removing any label columns for secondary languages (in both the survey and the choices tabs of your xlsform), which might help as a workaround.

1 Like

Okay!! I was able to create the dataset. Thank you!

1 Like

Hi All,
I am having issues again that syntax not working to export in stata.
I’ve tried kobo2stata syntax using :

“$dir/datafile.xlsx”, xlsform("$dir/templete.xlsx") choicelabel(“label::English”) surveylabel(“label::English”) dropnotes usenotsave (see picture)

For this I make sure the files and directories in the command syntax match.

The output process display:

Reading choices tab of XLSForm…
Reading survey tab of XLSForm…
Reading raw data file…

) required
r(100);

what is meant?
can anyone help to explain?

Many thank

@dy10, I have moved your post here so that @FSg could help you solve your issue.

@dy10 did you get any help on this? facing a similar challenge

@fjiva, could you kindly explain your issue so that @FSg should be able to help you with it?

When I run this command in STATA: kobo2stata using “C:\Users\windos 10\Downloads\pic1.xlsx”, xlsform(“C:\Users\windos 10\Downloads\pic2.xlsx”), there is error which says: ) required
r(100)

1 Like

Hi @dy10 and @fjiva, thanks for using kobo2stata. It’s a bit difficult to say without further details on your data, but if the screenshot from Diana above is any indication, my guess is you have introduced a mismatch between the column headers in your xlsform (“label:English (en)”) and what you have specified in the kobo2stata command (“label::English”). They need to match exactly, both for the choiceslabel and for the surveylabel option. If that doesn’t work, try removing any blank spaces in those column headers, so that the specification becomes “label::English(en)”- note the removed blank space between “label::English” and “(en)”.

1 Like

Hello,

I am using kobo2stata command to label variables and values using xlsform. However, the original xlsform has lengthier variable names. Is it possible to update/edit variable names under ‘name’ column in xlsform and then use kobo2stata command to apply these new variable names? unfortunately, when I do so, I see original lengthier names being applied and not the updated ones. Please help.

Welcome back to the community, @Ashwini! I have merged your post back to the main topic so that @FSg can keep track of all the details and upgrade as needed.