JCR Woes

So we've got a new internal system that we've built on top of JCR. Currently we're using Jackrabbit as the repository, but eventually it will be ported over to something like IBM Portal or something like that. Unfortunately, right now we're deploying the app to a pretty limited server – both in terms of CPU and RAM.

It turns out that using Jackrabbit with the Derby persistence manager in that kind of situation is a horrible, horrible idea. Everything works great on systems with modest amounts of CPU and RAM but once we deploy to that poor little virtual server in the sky page load times skyrocket and the whole thing becomes unusable.

Profiling showed a few things we could fix but didn't solve the problem, so we set out to do some testing with different persistence managers – we already have MySQL running, why not try that? Well, mostly because the JCR export produces XML and sadly with Jackrabbit (and possibly with any compliant repository – not yet sure) that produces an invalid XML file which can't then be imported. Turns out putting binary data in XML files doesn't really work – particularly if that data includes characters like  and � which aren't valid in XML even if you entity encode them. At this point I could just reimport the data again from scratch, but since I know we're going to need to migrate repositories again in the future I need a reliable way to export and reimport, not to mention the fact that it would be nice to be able to back up and a stable format instead of the random binary formats that form Jackrabbit's native configuration.

Sigh.

4 Responses to “JCR Woes”

  1. Odi Says:

    Uh Jackrabbit… Reminds me badly of my Magnolia experiences… Does it still store everything in a hashed filesystem? I don’t understand why they didn’t use a DB as a backend. It’s insane to use the fs for that. Try and take a consistent online snapshot for backup purposes? Errm… sorry, mate. Try and fix a broken repository by hand. Well, errm… now where do I start exactly? On top of that we had a huge file handle leak, but that may have been Magnolia’s fault – who knows.

    As for binary content in XML: you must Base64 encode it. Even if was a legal character (I don’t know if it is), I expect parsers written in C to treat it as EOF :-)


  2. Adrian Sutton Says:

    Odi,
    Yeah it’s not proving too wonderful for us. The content should have been base64 encoded and looked like it was, but clearly it hadn’t been done properly.


  3. Jukka Zitting Says:

    Do you use system view for the XML export? See https://issues.apache.org/jira/browse/JCR-674 for a bug report related to invalid XML characters and a potential fix that should be there for the system view format in the latest releases (seems like the bug wasn’t properly marked as resolved, I’ll check the status). Let me know if there’s something else you need.

    Ode, the original filesystem persistence options were early development versions. All the official Jackrabbit releases have used an embedded Derby database as the default persistence mechanism.


  4. Rob@Rojotek » Blog Archive » TDD With JackRabbit Says:

    [...] AJ mentioned, we are currently working with Apache JackRabbit for an internal project.  It has been an [...]


Leave a Reply

(Valid OpenIDs will skip moderation)

Alternatively, subscribe to the Atom feed.