Introducing kobo2stata (new on SSC)

Following the useful feedback from Stephen above, I have just submitted an updated version (v1.04) of kobo2stata to the SSC. This should go online in a few days, please make sure to run the “adoupdate” in Stata to get the latest version.

Updates (v1.04):

  • Moved the “multi” option to the core functionality of kobo2stata. It is no longer necessary to specify this option in order for select_multiple items to get labelled - this is now default behaviour.
  • Fixed a bug that caused kobo2stata to misbehave where the Kobo data contained variable names or value label names exceeding 32 characters (NB: this was also the cause of Stephen’s issues 2&4). Also made the consequences of overly long variable and value label names explicit in the help file.

Finally, please note that further testing (thanks Stephen!) has found that kobo2stata is not compatible with older Stata versions. The minimum requirement will therefore remain Stata 14.

I’m not sure I understand the requirement completely. The PowerBI instructions request a CSV file that is generated on each request. An XLS file can be generated the same way simply by changing the last three letters of the url from csv to xls. This uses the kobocat API, which doesn’t give access to versioned data or data labeled with whatever languages were used in the form design. The newer kpi API doesn’t use such live links, only asynchronous downloads that are requested through the API (see this small sample script).

1 Like

Only few click then I was done with the re-do of yet coding for stata; this is truly helpful; thank you for this wonderful ad-on.

3 Likes

improve kobo2stata cmmand

Hi @msodjinou
Could you be more specific on what you need as the improvement introduced
Stephane

1 Like

Hello Stephane.
Thank you for the contribution.
I am having a challenge with kobo2stata.
It generates error and insists worksheet not found. I have to types of questionnaires in kobo. The first set links well with its corresponding xls form.
The second one completely fails and indicates no worksheet found. This one had a number of versions deployed, does this cause a problem?
Kindly awaiting your help and guidance.
thanks Jackie

Hi @jackiemph19,

Welcome to the community! I have migrated your post here so that you should be able to have direct conversation with @FSg the developer of kobo2stata.

Have a great day!

Hi Jackie,
With apologies for the delay in responding, if you are still encountering these problems, feel free to reach out to me via the email stated at the bottom of the kobo2stata help file (type “help kobo2stata” in your Stata), providing me with the exact error message you are getting, and I’ll be happy to look into it.
Deployment of different versions should not affect kobo2stata’s ability to run.
Best,
Felix

2 Likes

Hallo Felix,
Thank you so much for providing kobostata2,
I have run the syntax according to the guidelines but a warning appears that “label ambiguous abbreviation” … what should I do?

Hi Dyana,
Thanks for your interest in kobo2stata.
Sounds like your specification of the label column headers in your kobo2stata command (i.e. the “surveylabel()” and “choiceslabel()” options) might not match the label column headers in your XLSForm.
If after checking this you are still encountering the problem, feel free to reach out to me via the email stated at the bottom of the kobo2stata help file (type “help kobo2stata” in your Stata), providing me with the exact specification of your Stata code line and the error message you are getting, as well as your XLSForm - and I’ll be happy to look into it.
Finally, please always make sure you are running the latest version of kobo2stata. Type “adoupdate” in Stata to check this.
Best,
Felix

2 Likes

Hi Felix,
Thanks so much for this add-on!
One question - my value labels are only importing for some of my variables. Do you have any idea why? For example, I get labels for my enumerator names, village, and some apparently random choice-variables throughout the set, but for most variables I just get the choice numbers from the “name” column on choices tab. As far as I can tell, all my choices are coded in the exact same way in the original excel form, so I’m not sure why its working for some and not others. This is not a big deal as I can manually attach the labels, but was just wondering if you have any idea why its happening!
Thanks,
Violet

1 Like

Hi Violet,
Thanks for using kobo2stata.
There could be a range of different explanations for this, most of which are documented in the help file. For example, this may happen if a certain value label name exceeds 32 characters, if your “name” column contains non-numeric values for a given value label, or if your value description is surrounded by chevrons / angle brackets.
Feel free to reach out to me via the email stated at the bottom of the kobo2stata help file (type “help kobo2stata” in your Stata), sharing your XLSForm - and I’ll be happy to look into it.
Finally, please always make sure you are running the latest version of kobo2stata. Type “adoupdate” in Stata to check this.
Best,
Felix

1 Like

Thank you! I looked into each of those possible errors but none of them seem to apply. I’ll play with it a bit more.

1 Like

Sorry to hear. Does kobo2stata issue any error message when you try to run it?

Nope! it all works perfectly, just missing most value labels as well as variable labels for the variables generated by multiple select option questions (although again, only in some - seemingly arbitrary - cases). Anyway I’ve gone through and patched in the labels manually, so not a big deal!

1 Like

Great. FYI and as mentioned in the help file, the 32 character limit on variable names I mentioned above is slightly stricter (29-30 characters) on select_multiple variables, due to the need to add a prefix/suffix on the derived variables - perhaps this had something to do with it. In any case, glad to hear you managed to label your dataset, and sorry to hear a bit of manual work was required.

1 Like

Hi Felix,
Thanks for this update.
I will send you the exact error message once i review this data sets again.
Thanks
Jackie.

1 Like

Good day,

I’m very happy to use this module. It makes my job of creating do-files very easy.
I’d like to propose one useful feature. Usually we use questionnaire on several languages (2-3 or even 4 label columns in XLSForm).

Kobo2stata has options surveylabel and choiceslabel that may use one parameter only, or one language.

I know that variables and values in Stata may contain several labels for several languages.

[D] label language – Labels for variables and values in multiple languages

It would be nice if Kobo2stata may compile several label columns in dataset.

For example:

kobo2stata using “C:/mydata/kobosurveydata.xlsx”, xlsform(“C:/mydata/aDyQEvcRVs9re5L.xls”) surveylabel(“Label::English” “Label::Uzbek” “Label::Russian”) choiceslabel(“Label::English” “Label::Uzbek” “Label::Russian”)

I would be appreciated if you realize this option or if you advise how reach this with help of Kobo2stata.

Best wishes and good luck with everything.
Please excuse me for my poor English.

1 Like

Hi Sergey,
Many thanks for using kobo2stata, and for the interesting suggestion. This would indeed be a nice added feature, although it would require a major rewrite of the command and I’m afraid I can’t find the time right now.
As a workaround, you might be able to run kobo2stata three times on the same dataset, once for each label language. Then specify the label language in Stata for each of the three resulting Stata datasets, using the above-mentioned “label language” command, and save. Finally, run a “merge 1:1 _all” to bring the three back together into a single dataset.
Please note that preserving labels when merging multilingual datasets in Stata is a bit tricky, and I haven’t verified that the above can indeed work. Here’s a useful article on the topic: https://journals.sagepub.com/doi/pdf/10.1177/1536867X1001000113
Best,
Felix

1 Like

I have data in KOBOCollect and want to use STATA to analyze my data. I have downloaded an excel sheet of the data and converted to .csv type to import into STATA. The select_multiple have helpfully been dealt with through the generation of additional numeric variables for each possible choice .

I am struggling with how to deal with select_one type questions. The variable in STATA is coded as a string based on the ‘name’ I generated in the ‘choices’ page. I used the KOBO2STATA code to generate my dataset - but this only helps with the ‘name’ that I gave the ‘choices’ was numeric. Many of my ‘names’ for my ‘choices’ are text as I wasn’t aware a numeric value was needed for KOBO2STATA. Please advise what the best course of action is to deal with these variables as a group to make them numeric instead of re-formatting each variable one by one.

I have briefly looked at ODKmeta but I am not quite sure how to use this on data files from KOBO, or if it is even possible.

Thank you