Text data type and csv data export

Hi All

I posted an issue on this forum some months ago but I hope someone can respond to my query this time around.

We have a deployed questionnaire on KoBoToolbox where enumerators complete this questionnaire with participants over the phone. It has been deployed since June. There are some text box responses but when text answers are provided they cause errors with the data download.

If a capital letter, a space or 999 is entered into the text box the data splits across 2 rows in the exported csv file. Why would these particular text characters cause this to happen?



The variable it splits on is meds_other_fu with the response entered as 999.

Maybe we are missing something in our form for text questions but it seems like a strange thing to happen,

Regards

Catherine

Hi,
This is not a normal behaviour, however it would be important to understand how CSV data is interpreted. As the name suggests, it means data within each column is separated by a comma i.e. comma separated values. When data i.e. text has some commas within them, the data is interpreted as separate columns unless there is a text delimiter defined. This is what defines continuous text and not column breaks e.g. this could be text appearing between two quotation marks “just like, this” would be treated as one column containing just like, this and not two columns i.e. just like & this.

With the above background, it is important to understand the genesis of the problem before we proceed i.e.

  1. Is the data appearing properly when you look at that tables directly on the KoBoToolbox platform? (Hint; system issue)
  2. Do you get specific steps when opening the CSV file or does it open automatically? (Hint: excel or system issue respectively)
  3. Can you open the CSV file using a plain text editor and see whether you can identify the commas and the delimiters within the text that is split differently.

Stephane

1 Like

Hello,
Adding to Stephane: You should also check if there are no newline/return (invisible characters) entered.
You might see this with cursor position in edit mode (or in Excel with adapted row height and wrap text).

Side note: It seems that you use 999 also as a variable name (see column title). This is not compliant to xlsform.org
“Names have to start with a letter or an underscore. …”
Might this be related with your problem?

2 Likes

Dear @wroos and @stephanealoo

Many thanks for both of your replies.

When I download the data from kobotool box i choose csv and XML values and headers. When I open in excel there are no delays or error messages however the data downloads as semi colon separated rather than comman separated. I import the data to SAS and have accounted for semi colon.

For this issue I opened the data in textpad see this image

The XLS form is named as highlighted green in this image

we have included a hint to state the interviewer should enter 999 if the participant doesn’t know. Now we have found out that 999 causes the row to split in two on download.

When looking on koboToolBox itsself nothing looks strange see this image:

To me it seems like an issue when converting the data to csv in koboToolbox itsself but I cannot test this assumption.

Thank you for your time

Catherine

2 Likes

Hi @cp622,

Would you try this in a different survey project with a small dummy xlsform that contains only the question where the issue is causing and see if you are able to get the issue there as well.

1 Like

Hi @Kal_Lam
Thank you for your advice, I created a new project on koboToolbox and uploaded a reduced version of our questionnaire containing just the questions of interest. I entered one submission via the kobo collect app, then downloaded this submission from kobo dashboard selection export type=csv and XML values and headers. i.e. I followed the exact same process as what usually happens. The download this time is fine see image

the row does not split in two.
Do you have any ideas why the screen shots above have split across two rows?

1 Like

Yes, this seems interesting! Would you now try with the xlsform that had initially had problem by uploading as a different survey project and see what happens (with the entire questions).

thank @Kal_Lam
I have done as you suggested and downloaded the data see it opened in textpad

There are 2 text box (Please specify variables) for the first one I entered blank space 999 blank space and the second one I just entered 999. However the data downloads as normal. Maybe it is because this form only has one entry while the real deployed form has 10000+
Have you any other suggestions to figure out why this would happen/re-create the scenario?

Catherine

Don’t have an idea on why you are seeing this at your end. Maybe we could flag the same with our developers and see if they are able to solve your issue. But for this we will be requiring your username, project name and the server you are using. Could you kindly share them with us through a private message so that we could share the same with our developers and see what the issue is.

Hi @Kal_Lam
When this was split across I needed to fix it on kobotoolbox in order to download the data without the error, otherwise i would need to edit the raw file daily. So if i do message the devloper team they will not be able to see the issue in the currently deployed form. Will this matter?

Don’t know exactly but they should be able to help figure out the issue and create a GitHub issue if it’s reproduceable.

thanks @Kal_Lam
I just did another few tests via kobo collect and the thing that seems to cause it is doing a carriage return in data entry, or that is what I have been able to reproduce.


Is there any way to prevent carriage returns for text boxes?

1 Like

Maybe a workaround that should work:

  • Identify the case _id that has a carriage return.
  • Open KoBoToolbox server and then edit the case (i.e. remove the carriage return) in Enketo.
  • Save and update the changes.
  • Download and see if it solved the issue.

My guess is that it should solve your issue. Kindly please confirm!

@Kal_Lam the work-around works, I can make edits to the data on kobo toolbox, however i would prefer not to have to do this. I will see if restricitng the number of characters in the deployed form maybe reduces carriage returns.
Thanks for you help
Catherine

1 Like

Glad to know that it works! Maybe as a learning, you could use some regex codes in the future so that enumerators don’t repeat the same issue.

hi @Kal_Lam
Yes maybe regex would solve our woes :slight_smile:
Thank you
Catherine

1 Like

@Kal_Lam and @cp622 I have been following the conversation since I am having the same issue with my data especially when I export using CSV. Export on XLS excel looks fine.

Have you guys figured out any regex that can limit enumerators from the carriage return?

Hi @chalasah

We used regex(.,’^[a-zA-Z0-9_, ]{1,100}$’) i hope this works for you

Best wishes

Catherine

1 Like

Thank you for sharing the solution with the entire community, @cp622! :clap: Expecting the same in the upcoming days too! :pray: