What's your approach to variable names?

Hi,

I’m working on a new project with KoboToolbox and i’m using XLSform.

I think most of you have the objective of exporting the data to Excel, Google Sheets, or data visualization platforms like Power BI or Looker Studio — and I’m no exception.

With that mind, I have a few questions and I would like to query the community with the hopes of generating healthy discussion and to see real life examples, if possible.

Keep in mind, in my survey sheet, all my questions are encapsulated in groups and in the settings sheet I have the style set to pages.


1. In the survey sheet, I understand underscores are technically valid for variable names (question names), but I was wondering if anyone has experienced a situation where the underscore was a source of trouble? (particularly when importing the data to Google Sheets or Looker Studio)

Example:

text	strategy_name	Enter the name of your strategy

2. Can we have duplicate variable names (question names in survey sheet) when they exist inside separate groups?

Example:

begin_group	off-road	Off-Road
text	plate	Vehicle plate number
end_group

begin_group	maritime	Maritime
text	plate	Boat plate number
end_group

3 . From my recent experience, KoboToolbox does allow for duplicate variable names (question names). So in that case, I was wondering what happens when you use the relevant column with a condition pointing to another question…. how does it know which question I’m referring to? Does it prioritize what’s inside the same group first, and then if it doesn’t find anything it looks outside the group?


4. In the survey sheet, how do you name your groups? And how do you name your questions that exist inside groups? Do you follow any particular scheme? Why?

I’ve used my own method in a previous project which was to be explicit with meaning, but keeping it concise. Some examples below:

input_photo1 (for photo upload)

input_photo2

instruction_photo1 (for notes)

But I’ve also seen an example where it was numbered:

begin_group ex1
text ex1_1 Enter some text
integer ex1_2 Enter a number
decimal ex1_3 Enter a decimal number
date ex1_4 Enter a date
end_group

Also, I’ve seen that KoboToolbox has an option to automatically add the group in the header row, when exporting data. I’ve tested it and it works well…. but what’s the purpose of that?
Since it uses the slash character “/” I’m not sure if that causes issues with Power BI or Looker Studio.


5. Can we use conditional formatting when working on the Excel file (before uploading to KoboToolbox)?
In Excel, I would like to add a rule to highlight duplicates in the name column — I know how to do it, my doubt is about conditional formatting introducing any conflict with KoboToolbox.

Because am dealing with statistical packages specifically STATA; here is what am doing:

  1. Variable name are very short like v1 v2 v2_1 etc. this will then be defined in stata. As i downloads data as XML Values

2-3. You can not have duplicate variable name in the same form entirely regardless of the groups/repeat group. If you have 2 variables namely PLATE in the form. If you want to reffer the question in another question which PLATE will be called.

  1. This depends on how you like to create you forms to make them easy in dealing with data processing and doing form ammendments.
    As for me;; i like to put my form in section . Lets say INTRODUCTION, DEMOGRAPHICS ETC.
    Inside each section there will be questions, groups and posibbly repeat.
    Each section will hold a uniqui introduction in variable names .

Example; INTRODCTION ; will be v0_1 v0_2 etc
DEMOGRAPHICS ; will have like v1_1 v1_2 etc etc

This makes it easy for me to identify variable as per section and also make it easy to update question in a section withouth affection my numbering . No need to use group name in variable name as they will be of no use in the processing of data.

/-Slash; you can remove them by checking the remove group names when downloading data

  1. Conditional formating do no interfere with kobo. There is no problem on using it on the xlsx form to check duplicates

Am open for more discussions

1 Like

Thanks @pruzige

I’m always intrigued by other people’s methods and there’s always something to learn (big or small).

My form is quite extensive (long, repetitive and heavy in terminology) and that’s the main reason for my “troubles” with naming. But I’m slowly finding my rhythm. While I like the idea of using short question names like v1_1, v1_2 , i’m afraid that won’t work so well because it will be very hard to ascertain what is what, in two scenarios:

  • Connecting to the KoboToolbox via API
  • Downloading the data in Excel format (with XML headers)

In the meantime I did some testing of my own:

  • Duplicate names: While deploying the form it gives no errors (which gives the impression that everything is fine), but as you said, the Relevant condition doesn’t work. So I will keep all names unique.

  • Conditional formatting: As you said, it works fine!

I understand its optional, but I was wondering when is it important to have the group names in the headers of the exported data?

Have you seen a use case?

To me this is not necessary and i always remove group names, note in post processing of the data.

To me, and my line of work short variable are very much necessary as am always working with XML header/values

It has no any issues with connecting with API
Download data in excel with xml header this is every practical though it is tough sometimes based on the the post processing plan, to me and the place i work this is like our standard way and we use STATA for post processing. But if you are using excel this is hard

How do you do the data processing, which software used?

As for duplicates names: You can have duplicates , otherwise KOBO must have a system which auto updates the variable name if there are duplicates (i hope so)
See screenshots below:


Hi.

I’m planning to use Google Looker Studio.

In the past, in a different project, I used Power BI.

What I meant was, I’m imagining it becomes hard to understand the data columns while building charts and such, when all the headers are just v1, v2, etc. Do you need to keep a cheat sheet on the side ?

yes, you must have a data dictionary with you. Btw; i have learned something, i like way you arranged your form from the other post about grid-them. the way you arranged your variable number is good

Well am not knowledgable on Google Looker, i will try to take a look. Also am not good with POwer BI. May be after sometime i can get back to comment on this

FWIW, you could also potentially leverage something that is, otherwise, largely obscured/hidden from the user, and put such question-specific metadata in a guidance hint. That is, use it to perhaps populate a more descriptive question identifier than the somewhat more indeterminate “Q1”…