Replacing MongoDB with Postgresql indexed JSONB

I’d like to try and replace MongoDB with Postgresql indexed JSONB fields on 9.4.

I’ve only just started looking at Kobo, so this will be a long term goal, but I’d like to kick off a discussion about what would need to be changed?

  • Will we need to change bamboo so it serializes its datasets to JSONB in Postgresql?
  • Will the whole ParsedInstance model go away if we can just read straight from logger.models.instance?
  • Should we be developing this upstream in onadata? I’m not really clear what’s different about your fork of onadata (other than django-reversion vs internal version field) or whether you are merging changes from upstream on an ongoing basis
  • There seems to be a lot of reliance of MongoDB syntax for querying, etc.
    It occurred to me when I first started looking that we could try and write a drop-in replacement for PyMongo (PyPongo?) that saves to Postgresql instead, creating one table per Collection, and converting incoming queries to use Postgresql JSON operators.

But if ParsedInstance really only exists to faciliate the Mongo queries, then it might be better to just update the apps.viewer to use methods on logger.models.instance or its Manager to find stuff.

Regards

Roger

Hi Roger,
Thanks for reaching out! Yes, contributions are welcome.

Parsed instance originally existed because we wanted to allow a basic record of a submission to proceed even if there were parse errors. It felt like the clearest way of doing this would be to separate it into a different app, where the “logger.models” represent the raw log of what was submitted and when, and the “viewer.models” represent how we wanted to process and display the submissions. That’s the reasoning behind it at least, but the code is several years old and can use fresh perspective.

  • I’m not sure about the Bamboo changes that will need to be made. I don’t know if people are using bamboo for data analysis, but it’s not one of our core features

  • The ParsedInstance could go away. If there’s some clear process to ensure that “advanced” aspects of the parsing of the instance do not interfere then I could see it being included in the same model. Advanced aspects being-- parsing of geopoints, calculating survey duration, and … well, a couple features which we may have tried in the past, but ultimately didn’t keep around.

  • I’ve asked peter@ona if they’re still interested in going this route.

I think you’re on the right idea about how to replace Mongo. I’m not sure about JSONB queries but if you’re able to do a drop in replacement that would be a nice way to go about it. Also, there could be some javascript-related nuances of the exports which we should be aware of.

Thanks again for your interest,

If you dive into the code and have questions and want to set up some time to review them, we could do that in the coming months.

Regards,

-Alex

···

On Mon, May 11, 2015 at 2:14 AM, Roger Hunwicks ro...@tonic-solutions.com wrote:

I’d like to try and replace MongoDB with Postgresql indexed JSONB fields on 9.4.

I’ve only just started looking at Kobo, so this will be a long term goal, but I’d like to kick off a discussion about what would need to be changed?

  • Will we need to change bamboo so it serializes its datasets to JSONB in Postgresql?
  • Will the whole ParsedInstance model go away if we can just read straight from logger.models.instance?
  • Should we be developing this upstream in onadata? I’m not really clear what’s different about your fork of onadata (other than django-reversion vs internal version field) or whether you are merging changes from upstream on an ongoing basis
  • There seems to be a lot of reliance of MongoDB syntax for querying, etc.
    It occurred to me when I first started looking that we could try and write a drop-in replacement for PyMongo (PyPongo?) that saves to Postgresql instead, creating one table per Collection, and converting incoming queries to use Postgresql JSON operators.

But if ParsedInstance really only exists to faciliate the Mongo queries, then it might be better to just update the apps.viewer to use methods on logger.models.instance or its Manager to find stuff.

Regards

Roger

You received this message because you are subscribed to the Google Groups “KoBo Developers” group.

To unsubscribe from this group and stop receiving emails from it, send an email to kobo-develope...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.


Alex Dorey
[US+1] 1.415.886.7537

The ParsedInstance could go away.

In the current system, is there anything in Mongo that isn’t derivable from the Postgresql database?

Also, does the current setup recover gracefully from missing information in Mongo - i.e. does it use Mongo like a cache, and in the event that the read from Mongo fails, it builds the document from the Postgresql model and saves it to Mongo, and carries on?

1 Like