The data from API are unstructured/are different when we compare with CSV or XML data extracted from form created

tetobrasil · June 24, 2021, 3:55pm

Currently, our country is developing a new platform called “Mapping of Rights”, which will use data from our polls on the Kobo platform to generate relevant data about more than hundreds of favelas in Brazil.

But, when we try to export data through API (which is working fine), it is being exported in coded form. In other words, the data extracted manually from the system is fine, but the data extracted from API is unstructured.

In order for our project to be automatically updated via API, using Power BI, we need to find a way to structure this data.

Can you help us?

Here a comparison between data extracted manually and data extracted from the API: MV_2106RP_-_all_versions_-_labels_-_2021-06-17-18-08-04.xlsx - Google Drive

Thank you for all!

Kal_Lam · June 24, 2021, 4:09pm

Welcome to the community, @tetobrasil! Apparently there are issues form the community regarding the same. Will reach you back when re solve this issue.

tetobrasil · June 24, 2021, 4:13pm

Thank you, @Kal_Lam! We look forward to receiving your response.

Josh · June 24, 2021, 4:58pm

Hi @tetobrasil, can you please clarify what you mean by “unstructured”?

tetobrasil · June 24, 2021, 7:42pm

Hi, @Josh! Thanks for the response.

We have a questionnaire with a determined structure. When we extract him manually from Kobo, in CSV or XLM, it’s fine. There is a sequence of columns and lines, with the correct responses. But, when we extracted it from API, the structure changed. The name of the question changed, the sequence of columns changed, etc. But, the main problem is because the number of columns changed too!

In other words, in our form, we have 115 questions. When we extract them manually, we have a spreadsheet with 115 columns, even if some are blank (unanswered). When extracting from API, the number of columns is smaller and we don’t know if it is due to a blank answer or if the API does not read part of the questionnaire.

You’ll can see more and better the example in this link: MV_2106RP_-_all_versions_-_labels_-_2021-06-17-18-08-04.xlsx - Google Drive

The yellow lines and columns are data extracted manually from Kobo. The orange lines and columns are data extracted from API. It is the same questionnaire, but the structure of the data is different. The sequence, the names, the response format, all different.

In order for Power BI to correctly read the information and automatically update the questionnaire data, we need this “unstructured” API data to be similar to the manually extracted data. Or at least we need to understand the logic of moving from one structure to another.

Josh · June 24, 2021, 7:55pm

Hi @tetobrasil,

Thank you for providing some more detail This is the expected behaviour of the JSON API — it will return the question and choice names (including their group hierarchy) and not their label as you can do with the CSV or XLS exports. Please refer to my explanation here as to why there are “missing” items in the JSON response compared to CSV or XLS.

You will have to do some post-processing on the JSON data to get the labels back but, as has been recently requested here, we may look at a more convenient way to do this in the future.

I hope that helps to clarify things.

tetobrasil · June 24, 2021, 8:31pm

Thank you, guys! I think it helps a lot to solve the problem. I’ve just shared the responses with my Development team.
If I have some more questions, I’ll come back here later.

Best regards.