Using Pull-data from csv and using the result to ask more Questions

od23 · December 23, 2024, 11:23am

Good day,
I have two csv files . One for households and the other for individuals. I want to use pull data to select a single household and then proceed to pull data from the individuals list (members who are in the household previously selected). I then want to ask which members are currently members of the household using select_multiple and then proceeding to only ask members of the household other questions. I want the data collectors to be able to edit the information in case there was a mistake when it was entered earlier. Hope I explained myself well. Thanks a lot

Kal_Lam · December 24, 2024, 9:53am

Welcome to the community, @od23! Did you mean you wish to create for form that is able to pull data from a CSV file to a repeat group?

od23 · December 25, 2024, 3:41pm

Hello, Thanks for your response. I have two csv files for households and individuals. I want to select_one a household from the household csv and then pull all members of the household from the individuals csv and ask if they’re still members of the household before proceeding to update their information “Age” “Relationship” “Name”. After all selected household members have updated their information, I want to then ask if there is and additional household member and enter their details (no matter how many they are). Merry Christmas and sorry if I am not being clear.

DatamaniacSteve · December 27, 2024, 9:31am

My approach would be to merge the household csv with the individual members csv. I presume the individual member id is the same as household with the suffix of line number to differentiate between individuals belonging to the same household.
The biodata of individuals can can be collapsed into one variable for example column age to contain all the ages of the members separated by space ie 50 40 27 16 8 2. Am not sure whether the names are on one column or separated. But they also needs to be separated by a space as well. Multiple names belonging to one person should be joined by underscore before this happens. This is for the interest of using selected-at(${names}, index-1) to pull each members variable.
On the actual questionnaire, ones needs to enter the household id which in term pulls all the names of the members and the size of the household. The next questions ask the respondent to verify that there is no member left out, if any then the number left out is ask.
The enumerator then enters a repeat group with count repeat is equal to household size plus numbers left out. The iteration is then done on the repeats with bio details pulled. Like when you in age question then the calculation is selected-at(${pulled_ages}, iteration_no-1). For the newly added member, nothing will be pulled so the enumerator can flexibly add the details.
I hope this approach and workflow will enable you achieve the same goal.

od23 · December 27, 2024, 10:18am

Thank you very much Datamaniacsteve. Please, I am going to trouble you. I am new to the pull data functionality of kobocollect. I am currently not allowed to upload files because I am new if not, I would have. About leaving a space for age and separating it by a space, I don’t think that would work as the data I intend to work on is somewhat large and that would be tedious and fraught with mistakes. I like the idea of putting it in one csv. please, can you show me how this would work? i want to

pull a household with select_one but all the households should be listed with a radio button to select_one.
when clicked, I want to pull the data of all members of that household and ask if they are currently members. I then want to proceed to only ask those members questions that pulls their name, age, relationship (it’s an update so any data we collected previously)
I want to then ask of they are any additional members in the household and if yes, add new details to the household (all details already collected.
if possible, I would like that in (1) when a household is selected and filled, if the user picks the same household, it should say - this household has already been selected.

Thank you for replying. It’s been really helpful. wish I could upload my form.

DatamaniacSteve · December 29, 2024, 9:53am

Hi od23,
You can achieve 1,2,3 as I have already explained. The first thing to cleaned up the csv data(here maybe you can replace missing ages with -99) this will ensure that the joined ages are correctly aligned. Then use this function;=TEXTJOIN(" ", TRUE, FILTER(Sheet2!C:C, Sheet2!A:A=A2)) in excel to join members data into household csv.
Here is the starting screenshots.

Then final output is

Pull other particulars in the same manner ie names, relationship and other details.You can then proceed with designing your form to use only the resulting household csv.
For number 4 on handling duplicates or already interviewed household then you can use the dynamic data attachments funtionality to check for existence in the submitted data tables.
I hope this will assist.

od23 · December 29, 2024, 1:27pm

Thank you very mich. I will work on implementing what you suggested. I am however quite new at doing it this way so if there is a sample of something you’ve done previously that can help me out, I’d really appreciate it. Thanks for your time

od23 · December 29, 2024, 7:56pm

Thank you. I use stata and i can easily get the data out to excel in the format you suggested. My problem is with the xlsform, not sure how i would call out members who have an seperated by space either as a single select or multi select. I think your method is really brilliant and i intend to work on it till i get it. Thanks

od23 · December 30, 2024, 10:11am

Hello,
so this is the part of my form

type	name	label	required	appearance	calculation	relevant
select_one hhid	nidhh	Select a household	yes	autocomplete quick search(bh.csv)
calculate	names				pulldata(‘bh’, ‘hhnames’, ‘nidhh’, ${nidhh})
calculate	ages				pulldata(‘bh’, ‘age’, ‘nidhh’, ${nidhh})
calculate	relationships				pulldata(‘household_data’, ‘relationship’, ‘nidhh’, ${nidhh})
calculate	cleaned_names				join(‘\n’, replace(split(${names}, ’ '), ‘_’, ’ '))
note	current_members_note	These are all the current members of the household:\n${cleaned_names}
select_multiple hh_names	current_members	Please pick the current members of the household	yes		join(’ ', split(${names}, ’ '))
begin_repeat	update_info	Update information for selected members				selected(${current_members}, name)
calculate	member_name				replace(selected-at(split(${names}, ’ '), position()-1), ‘_’, ’ ')
text	updated_name	Member name	yes		${member_name}
integer	updated_age	Member age	yes		selected-at(split(${ages}, ’ '), position()-1)
select_one relation	updated_relation	Relationship	yes	search(‘relationship’)	selected-at(split(${relationships}, ’ '), position()-1)
end_repeat
select_one yes_no	add_new	Are there any new members in the household?	yes
begin_repeat	add_member_info	Add new members				${add_new} = ‘yes’
text	new_name	Name	yes
integer	new_age	Age	yes
select_one relation	new_relation	Relationship	yes	search(‘relationship’)
end_repeat
calculate	final_members_display				join(‘\n’, concat(replace(split(${names}, ’ '), ‘_’, ’ '), ${new_members}))
note	final_note	Household Members:\n${final_members_display}				${add_new} = ‘no’

this is my csv

nidhh	nid	hhname	head	hhnames	relationship	age	hhlabel	hhsize
1	1347	Aminu Garba	1	Aminu_Garba Fatimah_Garba Ibrahim_Garba Amina_Garba Usman_Garba	1 2 3 3 4	50 40 20 16 10	1: Aminu Garba, in 1	5
2	2431	Musa Dasbang	1	Musa_Dasbang Sarah_Dasbang Aliyu_Dasbang Zainab_Dasbang Ibrahim_Dasbang	1 2 3 3 4 3	40 20 15 9 8 4	2: Musa Dasbang, in 1	6
3	5402	Hassan Audu	1	Hassan_Audu Maryam_Audu Suleiman_Audu Fatimah_Audu Kabir_Audu	1 2 3 3 4	45 27 11 5 3	3: Hassan Audu, in 1	5
4	7621	Ahmad Malik	1	Ahmad_Malik Zain_Malik Fatima_ Malik Omar_Malik Baga_Malik	1 2 3 3 3	39 25 16 8 2	4: Ahmad Malik, in 1	5
5	9879	Jibril Aminu	1	Jibril_Aminu	1	27	5: Jibril Aminu, in 1	1

The problem i am getting is how to pull the select_multiple. what to do in the choices list. If i leave it blank, it complains that i should have something in the choices list.

DatamaniacSteve · January 5, 2025, 7:21pm

Good approach there. The only thing you have left is member_id. You can also creeat it in the csv file the same way you do for names, ages and relationship. This will be a unique id to identify members of the household.
For the multiple select of names. I think you can hide it by making it only a calculate field. Then on entering the repeat group, the first question should be “Is ${xyz} still a member of the household”? If no, then skip the rest of the question and go to add a new member.
I have also realized that editing names, ages etc of the members are not effected when forms are finalized. A suggested work-around in order to commit changes made within the repeat group is to display the responses in question label or hint. The data entry person will then have a free space to enter the updates for names, relation, ages, sex etc.