Subject: Parsing Date of Birth from Argentinian Sex Document (Follow-up to Previous Question

Hi everyone,

Following up on my previous question from two months ago regarding using Argentinian sex documents to identify family unit gender composition (link: Take a certain part of a QR response with regex), I’m now seeking a way to parse the date of birth within the document into a standard date format.

Is it possible to achieve this using regular expressions (regex) or another method within KoboToolbox?

Thanks in advance for your assistance!

Hi @segosal279, as I remember from your other post, the date format was DD/MM/YYYY (as seen in 01/11/1969) right?

Which standard format you want to change it to?

Because if you are using @Xiphware’s method, you’ll be getting a string. You can use translate(*string* , *fromchars* , *tochars* ) approach that you can see from here: Form Operators and Functions - ODK Docs

1 Like

Hi @hakan_cetinkaya,

Yes, the original date format was indeed DD/MM/YYYY. I aim to retain this format while converting the data type from string to date.

I’ll be sure to explore the link you provided. Thank you for your assistance. :pray:

Its probably worth a brief background explanation…

First, a regex() will basically tell you whether or not (true or false) a string matches a particular pattern or not. But it wont really tell you explicitly what parts of your original string matched, nor extract portions of the string out for you.

Second, translate() just does character substitution for you. It will replace one character in your your string in situ with another; eg replace all lowercase letters with their UPPERCASE equivalent. But it will likewise not extract portions of the string for you either.

What you probably want to do is parse your custom date string format - ‘DD/MM/YYYY’ - to extract the day, month and year substrings, and then rearrange these into the correct proscribed format for a XForm date; ie ‘YYYY-MM-DD’.

If you are assured your original DD, MM, and YYYY substring are always going to be fixed length (ie days and month numbers are zero-padded if they are a single digit) then you know exactly which characters in your original string to pull out, so you can use the substr() function to do this. However, if your day or month number isnt zero-padded - eg you could have “1/6/2021” - then you’ll instead have to use the ‘/’ character delimiter to break your string into its three components, using the substring-before() and substring-after() functions.

There’s a good explanation of both these approaches here: Take a certain part of a QR response with regex - #5 by Xiphware

4 Likes