Word limit issue

Hello! I wanted to write in forum, but I thought it’s too sensitive to post this info. I’m looking for a solution, since I’m trying to set a word limit by formula provided in KoboToolbox: regex(., ‘^\W*(\w+\b\W*){1,250}$’) (max 250 as you can see). It works fine with a normal text. However, when it comes to the following text, it fails to see the limits, and if I remove around 21 words, only then it’ll work:

xxxxx

I suspect that it has something to do with a special characters. Could you please help me out with this? Thank you!

Welcome back to the community, @mushegianed! This post discussed previously should be helpful in understanding the scenario:

There is also an alternate to the workaround as outlined here:

Dear @Kal_Lam, thank you for your prompt answer and suggestions!

I’ve read these articles, however, it’s about character control, where I seek the word number control.
In this article Restricting Text Responses With Regular Expressions I used the following (modified in my case) expression:
" regex(., '^\W*(\w+\b\W*){3,5} Restrict an input of number of words (e.g. to restrict a range of words say 3 to 5))`

There’s no issue when there’s an ordinary text. However, when you use the text that I provided earlier, it doesn’t pass validation (250 words). I suspect it’s because of special characters. Do you know how to amend the regex to make this thing work right?
Many thanks!

Could you tell us examples of your “special character” issue, please.

Dear @wroos , thank you for your reply!

Here’s the sample text (the content is random here):

The astonishing surge of avian species in urban environments necessitates immediate implementation of effective conservation strategies. Within the municipal initiatives of the Urban Wildlife Preservation Act (Act 2019/73/UWP), we investigated the efficacy of introducing fledglings of urban avian species after hatching (i.e., by establishing a minimum preservation threshold, MPT) as a potential urban wildlife management approach. We monitored 142 urban bird monitoring sites across parks, rooftops, and gardens in metropolitan areas (2018-2021) and evaluated At-Nest Mortality (ANM; after hatching), Short-term post-release mortality (St-PRM; two hours post-release), and Long-term PRM (Lt-PRM; 72 hours post-release). Overall, ANM was 18.7%, but varied significantly among species, habitat types, seasons, and regions (N=5200; 30 species). St-PRM (N=1250; 20) and Lt-PRM (N=148; 5) were generally below 14% and 17%, respectively. Population dynamics models for the most common species were constructed to assess the projected population recoveries, accounting for different MPTs and survival rates.
Furthermore, within the UrbanAviary Project, we deployed tracking devices (n=2200; 18 species) and conducted behavioral studies to investigate post-release movements and habitat utilization of urban avian species. Preliminary findings for three highly urban-adapted species, Columba livia, Passer domesticus, and Sturnus vulgaris, revealed diverse patterns of space utilization, roosting preferences, and foraging strategies.
Our findings will be pivotal in guiding future urban wildlife management efforts and demonstrate that an integrated approach, considering key ecological and behavioral traits, is essential for developing sustainable and feasible management measures for urban avian populations.

Word counts 236 words. However, when I apply validation on max 250 words regex(., ‘^\W(\w+\b\W){1,250}$’)**, it fails, and if I remove 7 other words, only then it’ll pass this rule.

UPD: I tried to use the formula: regex(., ‘^\W*(\w+\s*\W*){1,250}$’) to make space as a separator, but the online form freezes once it hits validation rule…

Would you mind to check how many words you count yourself for this text?
Which 7 words did you remove?

I counted 238 words, but the problem is that people will use word counting in text processors (like Word)/online. In Word and most online services it shows 236 words.
I removed the last 7 words: “feasible management measures for urban avian populations.”

Maybe you can test (with a reduced example) how non-standard elements are treated by \w (and \W), e.g.
Act (Act 2019/73/UWP), hatching (i.e., by threshold, MPT) as approach. We 142 sites (2018-2021) and evaluated At-Nest Mortality (ANM; after hatching), Short-term post-release mortality (St-PRM; two hours post-release), and Long-term PRM (Lt-PRM; 72 hours post-release). Overall, ANM was 18.7%, regions (N=5200; 30 species). St-PRM (N=1250; 20) and Lt-PRM (N=148; 5) below 14% and 17%, respectively.

Thank you.

I used this to solve the issue: regex(., ‘^(?:\S+\s+){0,249}\S+$’)

2 Likes

@mushegianed :clap: :heart: :partying_face: