Introducing kobo2stata (new on SSC)

Dear Kobo users (and especially the Stata users among you),

I am pleased to announce that a new Stata module - kobo2stata - is now available from the SSC server. Many thanks to Kit Baum who maintains the SSC.

kobo2stata creates labelled Stata datasets from KoboToolbox. It combines the information contained in KoboToolbox’s raw data file and the XLSForm. The main focus is on applying the variable labels and value labels from the XLSForm to the data. There are also some secondary functions, e.g. the removal of HTML tags from labels, or the (optional) removal of note-type variables.

To try it out, just type “ssc install kobo2stata” in Stata. Once installed, you can type “help kobo2stata” for detailed explanations.

This is a new module, so if you experience any problems or have suggestions for improvement (or just want to say that it works great), I would love to hear from you via email at kobo2stata@gmail.com

All the best,
Felix

2 Likes

Thanks for this. This package is really helpful in automation the analysis in stata. However, I have noted the following limitations with the package;

  1. The package is not valid for stata13 and earlier versions.
  2. Choice list with more than 9 options are not labeled. Only options 1 to 9 are labeled.
  3. Variables with more than 32 characters are truncated (no problem with this) but their variable and value labels are not displayed (pose addition work in re-coding them manually)
  4. The notes variables are not removed on the variable list despite explicitly adding the option "dropnotes " on the command.
    5 Finally, this is not a limitation but a suggested functionality. Is it possible to provide login and file credentials to kobo account on the fly to allow stata to connect to the kobo account and get both the excel data file as well as the xlsform. This would really save time and enhance full automation of the realtime analysis of collected data.
    Otherwise thanks for the effort you are putting to easen lives of people in data analysis work!
2 Likes

Dear Stephen,

Thank you so much for trying out kobo2stata and providing these very thoughtful comments and suggestions.

A few first thoughts on the important points you raise:

  1. It is true that I have limited kobo2stata to Stata 14 and above. However, this is not because of a confirmed incompatibility with earlier versions but only due to my inability to validate compatibility with lower versions, since I don’t have access to earlier versions of Stata. In fact, I believe kobo2stata might work with all versions down to Stata 12 (but not below, due to Stata handling Excel imports differently up to version 11). There are two possible ways of overcoming this limitation: (a) I can open kobo2stata for use with Stata 12+, but add a cautionary message that compatibility with older versions of Stata has not been confirmed, or (b) if someone could test run kobo2stata on Stata 12&13, I could open it up without warning messages.

  2. This is odd. The behaviour you describe was neither observed in my own test runs nor by any of the beta testers. I would be happy to look into your specific case and identify the source of this problem. If interested, please send your XLSForm and raw data file to the email address stated above (NB: if you are concerned about data confidentiality, please note I don’t need your full dataset. Just the first two lines - one with the column headers and one with a single sample observation - is usually sufficient to identify problems).

  3. This is true, and a result of Stata’s limitation of variable names to 32 characters. I have noted this down for fixing in future versions of kobo2stata.

  4. This is odd. The behaviour you describe was neither observed in my own test runs nor by any of the beta testers. As mentioned above, happy to look into your specific case if you can send me your input files.

  5. This is an excellent idea, but I am not sure whether it is possible since I am unfamiliar with the backend of KoboToolbox for exporting data files. Essentially, if there is a permanent URL (of the form http://… or https://…) from which kobo2stata can pull the input files, then I can program kobo2stata to access them directly from the web. However, my own impression as a Kobo user is that the export files are not generated “live” but rather are generated only “upon user request” - in which case I don’t think kobo2stata can trigger the export of the required files. Perhaps one of the Kobo developers active in this forum can advise?

I’ve linked a Power BI dashboard to KoBo toolbox as shown here; http://support.kobotoolbox.org/managing-your-project-s-data/pulling-your-data-into-powerbi
I believe that the API essentially queries the KoBo serve to create a new CSV each time I refresh my dashboard, and indeed the URL is stable. Might be possible to use for this purpose as well!

Many thanks Jonathan for the very interesting pointer! Perhaps @tinok - who authored that page - can shed some light on the question whether there is a way to get URL-based access to the raw data Excel file without triggering the “refresh/update to latest version” from within PowerBI?

Following the useful feedback from Stephen above, I have just submitted an updated version (v1.04) of kobo2stata to the SSC. This should go online in a few days, please make sure to run the “adoupdate” in Stata to get the latest version.

Updates (v1.04):

  • Moved the “multi” option to the core functionality of kobo2stata. It is no longer necessary to specify this option in order for select_multiple items to get labelled - this is now default behaviour.
  • Fixed a bug that caused kobo2stata to misbehave where the Kobo data contained variable names or value label names exceeding 32 characters (NB: this was also the cause of Stephen’s issues 2&4). Also made the consequences of overly long variable and value label names explicit in the help file.

Finally, please note that further testing (thanks Stephen!) has found that kobo2stata is not compatible with older Stata versions. The minimum requirement will therefore remain Stata 14.

I’m not sure I understand the requirement completely. The PowerBI instructions request a CSV file that is generated on each request. An XLS file can be generated the same way simply by changing the last three letters of the url from csv to xls. This uses the kobocat API, which doesn’t give access to versioned data or data labeled with whatever languages were used in the form design. The newer kpi API doesn’t use such live links, only asynchronous downloads that are requested through the API (see this small sample script).

1 Like

Only few click then I was done with the re-do of yet coding for stata; this is truly helpful; thank you for this wonderful ad-on.

3 Likes

improve kobo2stata cmmand

Hi @msodjinou
Could you be more specific on what you need as the improvement introduced
Stephane

1 Like

Hello Stephane.
Thank you for the contribution.
I am having a challenge with kobo2stata.
It generates error and insists worksheet not found. I have to types of questionnaires in kobo. The first set links well with its corresponding xls form.
The second one completely fails and indicates no worksheet found. This one had a number of versions deployed, does this cause a problem?
Kindly awaiting your help and guidance.
thanks Jackie

Hi @jackiemph19,

Welcome to the community! I have migrated your post here so that you should be able to have direct conversation with @FSg the developer of kobo2stata.

Have a great day!

Hi Jackie,
With apologies for the delay in responding, if you are still encountering these problems, feel free to reach out to me via the email stated at the bottom of the kobo2stata help file (type “help kobo2stata” in your Stata), providing me with the exact error message you are getting, and I’ll be happy to look into it.
Deployment of different versions should not affect kobo2stata’s ability to run.
Best,
Felix

2 Likes

Hallo Felix,
Thank you so much for providing kobostata2,
I have run the syntax according to the guidelines but a warning appears that “label ambiguous abbreviation” … what should I do?

Hi Dyana,
Thanks for your interest in kobo2stata.
Sounds like your specification of the label column headers in your kobo2stata command (i.e. the “surveylabel()” and “choiceslabel()” options) might not match the label column headers in your XLSForm.
If after checking this you are still encountering the problem, feel free to reach out to me via the email stated at the bottom of the kobo2stata help file (type “help kobo2stata” in your Stata), providing me with the exact specification of your Stata code line and the error message you are getting, as well as your XLSForm - and I’ll be happy to look into it.
Finally, please always make sure you are running the latest version of kobo2stata. Type “adoupdate” in Stata to check this.
Best,
Felix

2 Likes

Hi Felix,
Thanks so much for this add-on!
One question - my value labels are only importing for some of my variables. Do you have any idea why? For example, I get labels for my enumerator names, village, and some apparently random choice-variables throughout the set, but for most variables I just get the choice numbers from the “name” column on choices tab. As far as I can tell, all my choices are coded in the exact same way in the original excel form, so I’m not sure why its working for some and not others. This is not a big deal as I can manually attach the labels, but was just wondering if you have any idea why its happening!
Thanks,
Violet

1 Like

Hi Violet,
Thanks for using kobo2stata.
There could be a range of different explanations for this, most of which are documented in the help file. For example, this may happen if a certain value label name exceeds 32 characters, if your “name” column contains non-numeric values for a given value label, or if your value description is surrounded by chevrons / angle brackets.
Feel free to reach out to me via the email stated at the bottom of the kobo2stata help file (type “help kobo2stata” in your Stata), sharing your XLSForm - and I’ll be happy to look into it.
Finally, please always make sure you are running the latest version of kobo2stata. Type “adoupdate” in Stata to check this.
Best,
Felix

1 Like

Thank you! I looked into each of those possible errors but none of them seem to apply. I’ll play with it a bit more.

1 Like

Sorry to hear. Does kobo2stata issue any error message when you try to run it?

Nope! it all works perfectly, just missing most value labels as well as variable labels for the variables generated by multiple select option questions (although again, only in some - seemingly arbitrary - cases). Anyway I’ve gone through and patched in the labels manually, so not a big deal!

1 Like