Regex failed to work in form/KoBo converting comma into space

Description

Confirmation email through google script stopped working. Checked and the csv file contains illegal response which should have been filtered in the form.

Downloaded the csv and xls

Steps to Reproduce

Regex used to check email:
^[A-Za-z0-9.%±]+@[A-Za-z0-9-]+([.][A-Za-z]+)+(,[A-Za-z0-9.%±]+@[A-Za-z0-9-]+([.][A-Za-z]+)+)*$

Illegal text that i received when downloading the csv and the xls file.
name_01@email.com name@email.com

Check the form, the illegal text shouldn’t have been accepted so this might have happened after the response was submitted (data was omitted) or somehow the form didnt work (or update) for that particular user

The form link i used was Enketo Express for KoBo Toolbox

Expected behavior

The regex as checked in regex101.com should not accept strings separated by space

Legal responses should be
email@domain.x.x
email1@domain1.x email2@domain2.x.x.x

@mkmortera, would you mind sharing your xlsform so that we could quickly check it at our end?

Sent you a message, attached was the xlsx file and the link to the form

@mkmortera, have you tried the regex code regex(., '^[A-Za-z0-9._%+-]+@[A-Za-z0-9-]+[.][A-Za-z]{2,}$') as outlined in our support article Restricting Text Responses With Regular Expressions to see if this solves your issue.

No, it doesnt capture my use cases. Its only good for common domains (gmail.com/yahoo.com) but doesnt work on domain.xxx.xx it also doesnt allow for multiple emails

In my form (and in regex101.com), the regex works as intended, it doesnt allow the illegal response ‘email_01@domain.xx email_02@domain.xx’ because multiple emails should be separated by ‘,’ per my regex.

Again there is nothing wrong with the regex. The problem here is I got an illegal response in the csv while its not even allowed in the form. :scream:

If the regex were the problem the illegal response wouldn’t have been flagged (as illegal) in the form and in the regex tester. I would have changed my regex instead of reporting it here.

I think this happened because the respondent had an outdated version (the one prior to april 20) of the form (one without the regex) so he/she was able to submit.


  1. A-Za-z0-9._%± ↩︎

@mkmortera, would you also be able to share with us the sample of the issues that you had while exporting your data to CSV? If you feel this should be treated confidentially you could share it with us through private message.

You could see this with the KoBo metadata: start, submission_time. Additionally, you could try to enter and submit an equal invalid case. If you can’t the cause is the past condition.

@mkmortera, so your requirement is that your code should allow:

  • Multiple email entries
  • The email should be with the following domain only e.g. example@domain.xxx.xx

Did I get you correctly?

Yes, but I think I have a solution at least until someone manage to break it again. Will share the regex later

1 Like

@mkmortera, maybe you could do it as outlined in the image shared below using this regex code regex(., ‘^([\W\d\D]+[@][\D]+[.][\D]{3}[.][\D]{2})+$’):

In the survey tab of your xlsform:

Data as seen in the server:

Data as seen when downloading in XLS format:

Reference xlsform:

Email Regex.xlsx (8.6 KB)

1 Like

Works, but I changed the quantifier {3} to + (might be more than 3) and {2} to * (might not exist e.g. @gmail.com)

1 Like