Pulling data into Power Bi via API: Skipped questions at set up not added when answered post set up

Hi

For capturing baseline data and recurrent updates, we have created a questionnaire (xlsform) which has become quite complex by now. There is a lot of skip logic and repeats – for instance, capturing either data on a school or data on a ECD centre, as well as asking for the number of preprimary classes and then for enrolment numbers of each of them.

Janna’s wonderful youtube tutorial (https://www.youtube.com/watch?v=mDokTnjM6Fo&t=518s) worked perfectly to get the data from kf.kobotoolbox.org via API into Power BI – thanks so much!

However, as a bloody beginner in Power BI, I wonder whether the following problem is just due to lack of Power BI skills or whether there is a problem in the way Power BI is communicating with kobotoolbox when the power query is done.

Two symptoms bother me most:

  1. Missing fields: Due to the many skip logic patterns (“tree” with many “branches”), I would have to fill out many, many forms to ensure that each and every question is answered at least once (and doesn’t remain skipped/empty in ALL questionnaires). For testing purposes, I filled only a few and it seems that only those fields/columns appear in Power BI which were not empty in the moment when I did the first power query (=when I connected via API as per youtube video). If I fill out another form and answer so far unused questions, the counter of questionnaires increases, but the new questions don’t appear in the Power BI table. Do I overlook some refresh function or something like that?

  2. Just one table: If I understand the Power BI tutorials correctly, repeated questions from kobotoolbox are shown in separate tables on the modelling page. But this is not the case for our questionnaire: The API connection brings all the columns into just ONE table. While I believe that the reference feature allows to extract columns into new tables, it looks like an awful lot of work – which by the way is meaningless if problem #1 (missing columns) is not solved beforehand. Any idea how to do the “import” differently?

Kind regards
York

…looks like my theory is wrong (which was that those fields which were not filled with data in the moment of setting up the connection are not included even if filled with data later). It is rather that this applies on group level: If I create a new Power BI file in a moment that a certain group has been skipped in all completed questionnaires so far, all the questions in that group will not be added once I complete a questionnaire which answers these questions…

So the problem IS indeed the not fully functioning API connection.

Any hints how to further analyse and eventually solve this problem?

It seems that when connecting to the data on the kobotoolbox server via API (for instance from Google Data Studio, MS Power BI, etc.), only those fields/columns will be “synchronized” which contain data. If a skipped question will be answered only in a later questionnaire, is there any chance to include it without modelling the whole data again and again?

Do you mean something like the case outlined here:

  • Suppose you have 3 question (Q1, Q2 and Q3). The response of Q2 is dependent with Q1 (if and only if the response is say Yes in Q1).
  • You entered a dummy response say Q1 (No), Q2 (NA so skips) and Q3 (Yes).
  • Now you connect the KoBoToolbox server with the API.
  • You get the response for Q1 and Q3.
  • You again fill up a dummy response say Q1 (Yes), Q2 (Yes, now fills) and Q3 (Yes).
  • You refresh say Power BI or Excel Power Query. The problem you outline is that the data does not sync i.e. you do not see the current data in Q2.

Kindly please confirm if i understood your issue well.

Exactly, perfectly described!

1 Like

@york_rff
To better understand if this issue is an API issue or a PowerBi issue, we need you to do the following checks:

  1. Using the API link, could you please get your data i.e. post the link on a browser and see which data comes out. Inspect the data to see if the skipped question actually has some data.
  2. If the skipped question has some data, then the issue is with PowerBI which would mean you have to work on a workaround i.e. ensuring that you only set up with a complete form in all its response-filled scenario. You can always do a dummy with no question skipped ensuring that all the questions are included in the very first setup.
  3. If the inspection in 1 above shows that the data is missing, then this is a definite bug which you should notify us then we look at it in detail.

Thank you so much for your patience.

Stephane

1 Like

Very interesting approach to find out more, thanks a lot!

I copied my xlsform, changed its form_title, deployed it, filled in one questionnaire, avoiding the “enrolment” section, and sent if off. Then I created a PowerBI file and in parallel entered the same URL directly into a browser. Then I filled in a second questionnaire, this time entering enrolment data, and sent it off again. As observed earlier, nothing happened in PowerBI after refresh. So I entered the same URL in a second browser window and – same result (= no enrolment fields)!

This feels like a discovery, although I can’t really imagine what this means. If I enter the URL in a browser, how can something else be shown as it is actually on the server in that moment?

Hi @york_rff
Thanks for that further check, in essence, the behavior you are describing indicates that this is a potential issue with the API. Let us have a look at this and we will get back to you. If we confirm that this is a bug, we will send you a link to the bug report and you can monitor the progress of the resolution.

Stephane

Sorry, something went wrong yesterday: I just discovered that the second questionnaire (the one with the additionally used fields) didn’t get through. After I now succeeded to sent such a second questionnaire, typing the URL directly into a browser resulted in a much longer list of fields/data. So, the issue might not be as clear as it seemed for a moment…!?

1 Like

Hi @york_rff,

In the meantime would you mind trying out the following:

Step 1: login to your account. Either kc.humanitarianresponse.info or kc.kobotoolbox.org
Step 2: Share Data Publicly: Yes
Step 3: View data in table
Ensure, value and header format: Labels

Step 4: Copy URL and use it in excel
or, copy and edit the following link:

https://kc.humanitarianresponse.info/”USERNAME”/reports/”FORMID”/export.csv

or if you are using the HHI server,

https://kc.kobotoolbox.org/”USERNAME”/reports/”FORMID”/export.csv

Where,
“USERNAME” is your login username
“FORMID” is the 22 character (it should see something like: aL7auvitUbKpdznfJuQi5U) that you should see in your survey link when you open the survey project.

Load your Power BI with the above API and then you should be able to get the data updated to your Power BI every time you refresh your data.

Kindly please let us know, this should solve your issue.

Thanks for this suggestion, I tried to follow it diligently.

Still from that test with the two filled forms (the first avoiding the enrolment section, the second using it), I created a csv file using this URL: https://kc.kobotoolbox.org/york_rff/reports/aBGSuQsyd2zdnv8mUQSxmx/export.csv – and after opening it in Excel, I found lots and lots of unused fields, BUT NOT any of the enrolment data…!?

Is it maybe that I have too many sub-sub-groups (third: level: group in a group in a group)? Puzzled…

Do you not find the variables after you pull it in the Excel Power Query?

Sorry, what exactly is an “Excel power query”?

By the way, maybe it is not the high number of sub-groups, but the repeats which create the problems?

OK, so do you mean you are not able to see only the data from repeat groups?

I tried to confirm. And indeed, it exactly looks like this: whatever questions are between repeat loops, are not shown.

Here are the lines from what I see after the csv export (as you described it above), followed by the respective lines from the xlsform (the term “Grade” comes through a variable for country-specific terminology):

Number of ${term_grade} 3 classes:
Number of ${term_grade} 2 classes:
Number of ${term_grade} 1 classes:

integer school_classes_g3 Number of ${term_grade} 3 classes:
begin repeat school_class_names_g3 ${term_grade} 3 classes names
calculate counter_names_g3
text school_class_name_g3 Name of ${term_grade} 3 class #${counter_names_g3}
end repeat
integer school_classes_g2 Number of ${term_grade} 2 classes:
begin repeat school_class_names_g2 ${term_grade} 2 classes names
calculate counter_names_g2
text school_class_name_g2 Name of ${term_grade} 2 class #${counter_names_g2}
end repeat
integer school_classes_g1 Number of ${term_grade} 1 classes:
begin repeat school_class_names_g1 ${term_grade} 1 classes names
calculate counter_names_g1
text school_class_name_g1 Name of ${term_grade} 1 class #${counter_names_g1}
end repeat

So your concern is that you are able to update/refresh the data if it does not have a repeat group but in cases where you have a repeat group data it is not showing in Power BI?

Yes, the problem is that if questions inside a repeat group get not filled before using the API for the first time, later use of these questions will be ignored in the power query (or even in a csv export).

I tried with the following url (with one of my dummy project) and it worked for me:

https://kc.kobotoolbox.org/api/v1/data/545827.xlsx

Maybe if you wish, you could make some dummy entries as well:

https://ee.kobotoolbox.org/x/w2cfGvN8

I could update the records (for both the household level as well as the roster information) once i press the refresh button.

Maybe this video tutorial prepared by @janna would help you solve your issue. Though the video pulls the data from an Excel file, you could still connect your data with the KoBoToolbox server:

Thanks. Are you sure that you tested it with AT FIRST no entries in the repeated questions, then established the API and THEN enter data in the repeated questions?

By the way, if I follow the recommended video (which I consulted several times before), during the get data procedure no “Navigator” windows pops up with SEPARATE tables.
image

I am not sure what this exercise/test was about, but I found out that the csv-file is only containing the “surface level” (but not the repeats).

Anyway, I created a completely new questionnaire, asking for numbers of four types of household members and then asking one or two repeated questions on each of them (four times “begin repeat” to “end repeat”.)

In the first two questionnaires, I left the number of two types blank (so no additional questions on them). Then I did the Power BI query and saw only data on those two used groups.

Then I filled out another questionnaire, this time saying 2 for one of the previously unused types and answered the repeated questions.

Again:

the additionally used types (and the repeated follow-up data) don’t appear, evon not when starting a completely new Power BI file and establishing the API again

the data of the repeated questions are hidden in “Lists” and there is no automatic split into main table and sub-tables

By the way: If I export the same data to excel and get data from excel in Power BI, everything is there (even in separate tables).