Introducing kobo2stata (new on SSC)

Hello Stephane.
Thank you for the contribution.
I am having a challenge with kobo2stata.
It generates error and insists worksheet not found. I have to types of questionnaires in kobo. The first set links well with its corresponding xls form.
The second one completely fails and indicates no worksheet found. This one had a number of versions deployed, does this cause a problem?
Kindly awaiting your help and guidance.
thanks Jackie

Hi @jackiemph19,

Welcome to the community! I have migrated your post here so that you should be able to have direct conversation with @FSg the developer of kobo2stata.

Have a great day!

Hi Jackie,
With apologies for the delay in responding, if you are still encountering these problems, feel free to reach out to me via the email stated at the bottom of the kobo2stata help file (type “help kobo2stata” in your Stata), providing me with the exact error message you are getting, and I’ll be happy to look into it.
Deployment of different versions should not affect kobo2stata’s ability to run.
Best,
Felix

2 Likes

Hallo Felix,
Thank you so much for providing kobostata2,
I have run the syntax according to the guidelines but a warning appears that “label ambiguous abbreviation” … what should I do?

Hi Dyana,
Thanks for your interest in kobo2stata.
Sounds like your specification of the label column headers in your kobo2stata command (i.e. the “surveylabel()” and “choiceslabel()” options) might not match the label column headers in your XLSForm.
If after checking this you are still encountering the problem, feel free to reach out to me via the email stated at the bottom of the kobo2stata help file (type “help kobo2stata” in your Stata), providing me with the exact specification of your Stata code line and the error message you are getting, as well as your XLSForm - and I’ll be happy to look into it.
Finally, please always make sure you are running the latest version of kobo2stata. Type “adoupdate” in Stata to check this.
Best,
Felix

2 Likes

Hi Felix,
Thanks so much for this add-on!
One question - my value labels are only importing for some of my variables. Do you have any idea why? For example, I get labels for my enumerator names, village, and some apparently random choice-variables throughout the set, but for most variables I just get the choice numbers from the “name” column on choices tab. As far as I can tell, all my choices are coded in the exact same way in the original excel form, so I’m not sure why its working for some and not others. This is not a big deal as I can manually attach the labels, but was just wondering if you have any idea why its happening!
Thanks,
Violet

1 Like

Hi Violet,
Thanks for using kobo2stata.
There could be a range of different explanations for this, most of which are documented in the help file. For example, this may happen if a certain value label name exceeds 32 characters, if your “name” column contains non-numeric values for a given value label, or if your value description is surrounded by chevrons / angle brackets.
Feel free to reach out to me via the email stated at the bottom of the kobo2stata help file (type “help kobo2stata” in your Stata), sharing your XLSForm - and I’ll be happy to look into it.
Finally, please always make sure you are running the latest version of kobo2stata. Type “adoupdate” in Stata to check this.
Best,
Felix

1 Like

Thank you! I looked into each of those possible errors but none of them seem to apply. I’ll play with it a bit more.

1 Like

Sorry to hear. Does kobo2stata issue any error message when you try to run it?

Nope! it all works perfectly, just missing most value labels as well as variable labels for the variables generated by multiple select option questions (although again, only in some - seemingly arbitrary - cases). Anyway I’ve gone through and patched in the labels manually, so not a big deal!

1 Like

Great. FYI and as mentioned in the help file, the 32 character limit on variable names I mentioned above is slightly stricter (29-30 characters) on select_multiple variables, due to the need to add a prefix/suffix on the derived variables - perhaps this had something to do with it. In any case, glad to hear you managed to label your dataset, and sorry to hear a bit of manual work was required.

1 Like

Hi Felix,
Thanks for this update.
I will send you the exact error message once i review this data sets again.
Thanks
Jackie.

1 Like

Good day,

I’m very happy to use this module. It makes my job of creating do-files very easy.
I’d like to propose one useful feature. Usually we use questionnaire on several languages (2-3 or even 4 label columns in XLSForm).

Kobo2stata has options surveylabel and choiceslabel that may use one parameter only, or one language.

I know that variables and values in Stata may contain several labels for several languages.

[D] label language – Labels for variables and values in multiple languages

It would be nice if Kobo2stata may compile several label columns in dataset.

For example:

kobo2stata using “C:/mydata/kobosurveydata.xlsx”, xlsform(“C:/mydata/aDyQEvcRVs9re5L.xls”) surveylabel(“Label::English” “Label::Uzbek” “Label::Russian”) choiceslabel(“Label::English” “Label::Uzbek” “Label::Russian”)

I would be appreciated if you realize this option or if you advise how reach this with help of Kobo2stata.

Best wishes and good luck with everything.
Please excuse me for my poor English.

1 Like

Hi Sergey,
Many thanks for using kobo2stata, and for the interesting suggestion. This would indeed be a nice added feature, although it would require a major rewrite of the command and I’m afraid I can’t find the time right now.
As a workaround, you might be able to run kobo2stata three times on the same dataset, once for each label language. Then specify the label language in Stata for each of the three resulting Stata datasets, using the above-mentioned “label language” command, and save. Finally, run a “merge 1:1 _all” to bring the three back together into a single dataset.
Please note that preserving labels when merging multilingual datasets in Stata is a bit tricky, and I haven’t verified that the above can indeed work. Here’s a useful article on the topic: https://journals.sagepub.com/doi/pdf/10.1177/1536867X1001000113
Best,
Felix

1 Like

I have data in KOBOCollect and want to use STATA to analyze my data. I have downloaded an excel sheet of the data and converted to .csv type to import into STATA. The select_multiple have helpfully been dealt with through the generation of additional numeric variables for each possible choice .

I am struggling with how to deal with select_one type questions. The variable in STATA is coded as a string based on the ‘name’ I generated in the ‘choices’ page. I used the KOBO2STATA code to generate my dataset - but this only helps with the ‘name’ that I gave the ‘choices’ was numeric. Many of my ‘names’ for my ‘choices’ are text as I wasn’t aware a numeric value was needed for KOBO2STATA. Please advise what the best course of action is to deal with these variables as a group to make them numeric instead of re-formatting each variable one by one.

I have briefly looked at ODKmeta but I am not quite sure how to use this on data files from KOBO, or if it is even possible.

Thank you

Welcome back to the community, @richards_8. Have moved your post here seeing the discussion would be more relevant here.

Stata’s “encode” command should do the trick.
See e.g. How can I convert string variables to numeric variables in Stata? | Stata FAQ

1 Like

Yes I am aware about using encode - I was hoping there was a way to do it to a group of variables instead of applying encode to each individual variable

Does ODKmeta offer this?

Hello,
I am getting the error “label ambiguous abbreviation” but have checked, and the label in my command matches the one in the label column headers in the form excel.

Also, I understand that the program will only understand numeric values in the “name” column of the XLSForm, but my original data was not coded that way. I manually replaced the names in the excel with numbers - is that okay?

Hi Rachael,

Thanks for getting in touch.

On non-numeric values, you would need to systematically replace any string values with numbers in both the xlsform and the dataset, in order for kobo2stata to match them across these two input files. This may or may not be feasible in your case, but given the amount of manual work usually involved in this (and how prone to errors it will be) I cannot recommend it. As a fallback, you can still run your files through kobo2stata to apply at least the variable labels (but not value labels), and then run your categorical string variables through a crude “encode” in Stata for some rudimentary value labels.

On the label-related error message, first please make sure you are using the latest version of kobo2stata (as of today, this is v1.06 - see the version number at the very bottom of the kobo2stata help file). If you have an outdated version, please run Stata’s “adoupdate” command. If you do have the latest version of kobo2stata and the problem persists, you might be using non-standard notation in your label column headers, as a result of which kobo2stata can have troubles identifying the correct label language in a multi-lingual setup. Try removing any label columns for secondary languages (in both the survey and the choices tabs of your xlsform), which might help as a workaround.

1 Like