All columns do not appear when connecting to powerbi

tom.joseph · March 24, 2021, 4:34am

Hi all,

I have a survey that I would like to connect to PowerBI. I was able to make the connection to the form by providing the form link as well as authentication token, however it only displays 87 of the 298 columns. This survey is pretty long with nested skip logic. I currently have 1 submission using the form and the fields that appear in powerbi are the ones completed in the 1 submission. When I download the excel output from kobo, all 298 columns are there without issue. Would someone be able to advise?

Best,
Tom

Kal_Lam · March 24, 2021, 4:43am

Welcome back to the community, @tom.joseph! Would you try making a fresh connection to see if you are now able to see all the columns? We had discussions in the past that if some columns did not have entries and they were connected to Power BI those columns would not be visible (i.e., columns with entries are only visible in Power BI). Thus if you are trying to refresh the updates from a live data entry project some columns may not get updated if they had no records while connecting to Power BI.

tom.joseph · March 24, 2021, 4:49am

Thank you for the quick reply @Kal_Lam . I tried to refresh the connection just now but still see 87 of the 298 columns. Are there any settings I can change to ensure that all 298 columns appear? I want to make sure all edits within power query are addressed without having to make any additional changes as submissions come in and new columns are generated.

Kal_Lam · March 24, 2021, 4:53am

What I meant was that you will need to make a new connection to see the total number of active columns. You may also need to download your data in an xls format and check out the total number of columns that has been filled.

tom.joseph · March 24, 2021, 5:00am

Thank you for the clarification. The xls has 139/298 columns with a value in them. PowerBI continues to generate 87.

Also to note that column names also lead with the word group, where as none of the xls columns have the group name in them

wroos · March 24, 2021, 12:29pm

Hello
The group prefix can be deactivated through your download setting.

tom.joseph · March 24, 2021, 1:03pm

@wroos Thank you for the advice. I took a look at the download settings found here:

tom.joseph · March 24, 2021, 1:15pm

@Kal_Lam Would it be possible to request functionality that would allow the user to specify if they want all columns to populate or not when querying kobo? Having a dynamically changing columns with an active survey does produce issues of its own with maintenance of dashboards tied to the kobo data.

Kal_Lam · March 24, 2021, 3:29pm

@tom.joseph, do you mean in the custom reports?

tom.joseph · March 24, 2021, 6:05pm

@Kal_Lam To clarify, when querying Kobo from PowerBi, if there are optional parameters that could be passed to ensure all columns are generated within PowerBi.

The challenge is that unless I send in several test submissions that complete all of the possible fields within kobo, less than 50% of the fields will generate within powerbi. As additional submissions are created, that would then require additional ongoing maintenance to update the powerquery so that it accounts for the new submissions./columns.

To address the above challenge, would it be feasible to have an option where all the columns are displayed or the user can specify if only completed columns are to be displayed?

stephanealoo · March 24, 2021, 6:26pm

Hi @tom.joseph
I have gone through the communication trail above and I will narrow it to the last issue you raised as regards PowerBi.

This is an issue with PowerBi and how it pulls the data. As you rightly pointed out, you need to have data in all the columns to fully defined how you want those columns displayed. However, you could try adapting the fix done in advanced editor

I would also suggest you scheme through the following topics
Stephane

ks_1 · March 24, 2021, 10:29pm

The workaround is:

Create the form the way you want it.
Download the xls of the form and remove all the data in the ‘relevant’ column. Let’s call this the test form.
Replace the xls file in the same project with the test form.
Enter some test data in all the fields and submit. Without the skip logic, you will see all the possible fields in the form.
Connect the project to power BI and pull the data.
Replace the form in the project with the original xls with all the skip logic intact.
Delete your test submission.

tom.joseph · March 24, 2021, 11:36pm

@ks_1 Thank you for the detailed and step-by-step workaround. I will test this out but think it should resolve the issue. Appreciate everyone’s time identifying a solution!

nat · March 25, 2021, 11:20am

Hi all, have been battling with this also and I think there are 3 parts to this business problem of setting up PowerBI integration before all the possible variations have been submitted - partly to do with the way KoBo behaves and partly to do with the way Power BI behaves. FYI I use JSON format when connecting to the KoBo API.

The original issue identified by @tom.joseph in the very first post above whereby the KoBo API only returns data for fields that have been populated in the submissions. E.g. if I have a form with 3 (non-mandatory) questions, and my first submission only has answers to 2 of those questions, then only those two questions will be returned as fields in the JSON - e.g. in my form below had 3 questions breakfast, rain and new_question which all displayed onscreen, but as I did not answer new_question there is no data in the JSON from the API v2 (same behaviour in API v1) so when I first connect to PBI it is as if that question did not exist and PBI does not create a column for it.

   _id: 32356,
   _notes: [ ],
   _validation_status: { },
   _uuid: "b1ea757b-a634-487d-a5f0-a5227661788a",
   _tags: [ ],
   rain: "no",
   _submitted_by: null,
   _xform_id_string: "af8Zwtsc6h9AW2UhH8qpnA",
   breakfast: "Weetbix",
   meta/instanceID: "uuid:b1ea757b-a634-487d-a5f0-a5227661788a",
   formhub/uuid: "a56ee09be23b488696ab37b3fc7b4f5a",
   end: "2021-03-25T23:48:09.211+13:00",
   _submission_time: "2021-03-25T10:48:10",
   _attachments: [ ],
   start: "2021-03-25T23:48:02.617+13:00",
   _geolocation: [
   null,
   null
   ],
   _status: "submitted_via_web",
   __version__: "vzmZ97h7GNEVEgko7yj2Qj"
   }

This is something that can only be solved on the KoBo side - there is no way for Power BI to know about any data that isn’t exposed via the API! Possible ways to solve this business problem:

figure out a way to get a list of all possible fields and then manually code them as columns in Power BI (which can get complicated if nested groups/repeats etc.) - particularly because as you say above, in the API the fieldnames are returned with group prefixes, and I can’t find a way to turn this off in the data API (only the Excel export/download)
KoBo enhancement to add a toggle in the API to request the JSON to always return fields even if empty for a given record (which would massively increase the size of the data file if there are lots of empty fields) - this is what @tom.joseph is requesting here.

nat · March 25, 2021, 11:35am

the situation when the Power BI integration has been set up based on the first few submissions, then more submissions come in which have fields that hadn’t been populated in the initial load - i.e. PBI doesn’t already have columns for those fields. I think this is what is being referred to by @stephanealoo here
This is something that can be updated on the Power BI side but can be a bit of a faff.

The thing is, when using the UI to Expand Results then you select the list of columns to expand - but not all are automatically loaded dynamically every time. You can manually go in after there are more submissions, and manually click the ‘load more’ button then select them all and save

If you look, you can see that when using the PowerBI UI to expand columns then all the column names are explicitly listed, so when new fields come through they will be ignored as they’re not on the list. e.g.:

However it is also possible to code the PowerBI side to be more extensible - this often requires editing in the Advanced Editor etc. - i.e. by editing the Powerquery/M Language rather than clicking via the UI.

For example, using a trick from this YouTube video we managed to set it up so our Powerquery handles new fields automagically (in this case data_repeat is the column we want to expand to its fullest no matter how many underlying columns there are):

    #"Dynamically Expanded data_repeat" = Table.ExpandRecordColumn(#"Replaced Value to handle nulls", "data_repeat", Record.FieldNames(Record.Combine(Table.Column(#"Replaced Value to handle nulls", "data_repeat")))),

We were using it in the context of repeating sections combining data from multiple unrelated repeats had to do a bit of extra jiggery pokery to handle nulls etc. - I’m sure the below could be done more elegantly but here is our Powerquery code if it’s useful to anyone:

    #"Expanded data_repeat" = Table.ExpandListColumn(#"Renamed Columns", "data_repeat"),
    #"Replaced Value to handle nulls" = Table.ReplaceValue(#"Expanded data_repeat",null,[null_record = "null_record"],Replacer.ReplaceValue,{"data_repeat"}),
    #"Dynamically Expanded data_repeat" = Table.ExpandRecordColumn(#"Replaced Value to handle nulls", "data_repeat", Record.FieldNames(Record.Combine(Table.Column(#"Replaced Value to handle nulls", "data_repeat")))),
    #"Handle scenario where no null records" = if Table.HasColumns(#"Dynamically Expanded data_repeat", {"null_record"}) then #"Dynamically Expanded data_repeat" else Table.Combine({#"Dynamically Expanded data_repeat", Table.FromRows({},{"null_record"})}),
    #"Filtered Rows to remove repeats with nothing selected" = Table.SelectRows(#"Handle scenario where no null records", each ([null_record] = null)),
    #"Removed null helper column" = Table.RemoveColumns(#"Filtered Rows to remove repeats with nothing selected",{"null_record"}),

Lastly,

There are also other considerations when PBI is not loading as one expects - e.g. sometimes need to hit “refresh all” and poss clear the cache if columns added. Also PowerBI only profiles the first 1000 rows… so if the new fields are appearing after the 1000 row mark then I’m not sure how it handles. But I’ve only hit that problem once and handled it manually at that point I believe you can manually change a setting to profile based on the entire data set but then it would take longer.

Most of my issues have been to do with 1 and 2 above. We should consider putting something in the suggestion box about #1 as I know several colleagues have hit this stumbling block also.

nat · March 25, 2021, 11:40am

Further to the comment above re extensibility, I also try to do things in PBI that don’t explicitly list all columns - just the ones I want to manipulate in that particular step. e.g.

using a function to reorder columns in an extensible way (so we only specified the columns we wanted to put at the start, rather than explicitly listing ALL the columns we wanted)
selecting the columns we wanted to keep then choosing ‘remove other columns’ rather than explicitly specifying the columns to keep

etc.

wroos · March 25, 2021, 1:31pm

Hello @nat ,
We are not using Power BI, but maybe an idea:
If there is an easy option to import xlsx into power builder. You could start with an xlsx KoBo download, even with the first submission, which gives ALL columns in the right order, and import this XLSX into Power BI to get the data set configured.

ks_1 · March 25, 2021, 3:20pm

This won’t work easily because the steps to import data from an excel file and a JSON web source are quite different.

nat · April 1, 2021, 6:14am

@wroos sorry for my slow response - agree starting with a list of all the column names could be useful - however in the case of JSON vs csv source in PBI, as the JSON needs to be expanded in order to access groups/repeats then there’d be a bit of manual work to figure out that logic.

Also, the JSON always returns groups in the field names, can’t find a param to turn that off.

@tomisin do you have any repeats in your data? As far as I’m aware, csv is great for scenarios where there are no repeats, but if there are repeat groups then can’t access that data via csv (unless you know of a way?)

Thanks all

sara_7500 · April 4, 2021, 2:16pm

hello kobo team

I am still facing the same problem there is some missing column on the power BI after I make connection using API. please help me i have Search discussion on the coming days.

thanks a lot