Integrating Kobo in Organizational Databases at Scale

We are working on a project to create a tool that will allow Kobo data from a suite of standard forms to be easily mapped to Postgres databases for integration purpose. We want to create a tool that can be re-used by multiple organizations. Below is an outline of what we are working on.

Does anyone have experience in this area that they are willing to share or any other inputs/suggestions?

Vision

Efforts are underway to develop a standard library of Kobo form for conservation (KoboConsForms). Conservation organizations should easily be able to deploy a Kobo Integration Toolkit (KIT) server that:

  • Allows organizations to aggregate data from KoboConsForms from multiple accounts into a central Postgres database sources data from both the [Kobo for Researchers and Aid Workers] instance and from local Kobo servers
  • Allows multiple KIT’s to be deployed and updated (e.g. separate KITs for different organizations or initiatives)
  • Allows deployment of new Kobo forms (including corresponding Postgres table creation script and code to map Kobo forms to database table)
  • Allows access to report generated from the Postgres database
  • Integrates with R to allow R scripts to be run to analyze data
  • Have open APIs to allow data to be fed into other systems
  • Support a range of login options (e.g. Google, O365 etc)

Phases

Phase 1

Phase 1 has been completed. This focused on testing the deployment of the Docker based tool [Kobo Aggregator] (https://github.com/wri/django_kobo) (KA) tool from WRI to test it’s ability to deliver a foundation for creating KIT. KA was easy to deploy. Registering accounts to source data from was simple. It supports to Kobo form [BNS] Included in the application is an [NGINX] based web application that allows logged in users to view and analyze their data.

It falls short on the vision in the following priority areas:

  • It is not generalized to support additional survey
  • It is not simple to add new surveys with corresponding Postgres tables
  • It does not have an API for accessing data

Phase 2

The suggested activities for Phase 2 are:

Extend KA to support two additional surveys :

  • Offtake:
  • Socio Economic Database : WCS Socio Economics Library )

Solve a (to be defined) subset of the [bugs/issues]

Create a proposal on how to generalize the NGINX application to support multiple addition surveys

Support logins to multiple Kobo instances

Create a proposal on how KA could support upload a “package” that defined a Kobo form, a set of Postgres tables to map to and a mechanism or code for mapping them. The proposal should indicate what level of complexity could be supported by the package approach - at a minimum it should support basic forms with no workflow or branching. If possible the proposal should a phased approach to supporting complexity. The proposal should also indicate how a package could be modified or adapted over time e.g. supporting the adding or hiding of questions.

Phase 3

Create KIT delivering at least:

  • Allow organizations to aggregate data from KoboConsForms from multiple accounts into a central Postgres database sources data from both the [Kobo for Researchers and Aid Workers] instance and from local Kobo servers
  • Allows multiple KIT’s to be deployed and updated (e.g. separate KITs for different organizations or initiatives)
  • Allows deployment of new Kobo forms (including corresponding Postgres table creation script and code to map Kobo forms to database table)
  • Allows access to report generated from the Postgres database

If the (or a?) goal is basically to get data from a suite of Kobo forms (aka XForms) submitted via KoboCollect (aka ODK Collect) into a Postgres database, you might actually want to take a quick look at ODK Central, which is an OpenRosa compatible (therefore should be compatible with both your existing forms and if you want to continue to use KoboCollect for data collection) Postgres DB backend. Central also provides a full REST API to all forms and submissions, and an optional GUI frontend if needed.

Disclaimer: I’m on the ODK TSC. But there has been some interest expressed in using Central to perhaps replace KoBoKat.

Another tool you may wish to look at using in your solution is ODK Briefcase. This may enable you to pull submission data from multiple OpenRosa-compliant servers (and accounts) - including Kobo, ODK Aggregate, and probably Ono too - and then push it elsewhere, including to an OpenRosa-compatible server with Postgres DB backend (eg ODK Central).

Notwithstanding your other requirements, you may be able to mockup a partial working prototype using these existing (OpenRosa-interoperable) ODK ecosystem tools.

Dr Gareth

Thanks so much for the suggestions. Do you know if any of the teams behind your recommendations would be willing to put together a proposal to address our requirements?

Jonathan

I’ve passed this along to some potential candidates; hopefully you will hear something back.

Jonathan - a colleague brought this to my attention. I have some thoughts on how the back-end logic can be automated using an open source ETL tool called Kettle. Kettle is metadata-driven so it should be possible to 1) retrieve the XLSForm(s) using a REST call, 2) extract the table schema(s) from the Excel sheet, 3) dynamically create the table(s) in Postgres and then 4) retrieve and load the data set(s) from Kobo. This workflow could be “wrapped” in a custom UI and executed on demand. If you’re interested please let me know and we can exchange contact information.

Regards,

-Dave