The Problem With Atom

September 2nd, 2008

I’ve always liked the Atom spec. It’s neat and tidy with strict rules about what’s valid and what’s not with all those rough corners and incompatibilities of RSS sorted out (well, mostly). If I run into one of the silly sites that offer both RSS and Atom I pick Atom just because it feels right even though both would work perfectly well for me. So it came as quite a surprise to me to discover a major weakness in the Atom spec - it’s a right pain to generate. Let me explain…

For various reasons, I’ve spent a fair bit of time over the last few months converting LiveWorks! from being hosted on WordPress to being a “real” website running on Lotus Web Content Management (IWWCM - yes I know the acronym doesn’t match the current name).

Anyway, one of the things that’s baked into WordPress because of it’s blogging culture is the concepts of RSS/Atom feeds. IWWCM comes from an entirely different world so the concept of feeds is quite foreign to it. Fortunately, it’s a very flexible system and has some simple tools you can throw at the problem. All you do is use a “menu”, which allows you select content from the repository according to certain criteria and order them any way you want. In this case, we simply order by published date. Menus in IWWCM have a head, a bit that’s repeated for each item and a footer. So you throw all the XML from before the first item in the head, a template for each item in the repeated bit and close off any remaining tags in the footer.  Simple enough.

The problem is, Atom doesn’t follow that structure. Atom includes one key piece of the content items below in the head: the most recent update time. In order to generate that head section of the XML, you have to have the meta data about the most recent item in the system. In most cases that’s fine and having that element makes parsing an awful lot simpler, but it’s a surprisingly annoying requirement. The IWWCM menu structure, simply can’t handle this and so can’t generate a valid Atom feed.

RSS on the other hand wasn’t as nicely designed to make detecting changes simple. It doesn’t have to have that updated element in the head section and so it’s perfectly suited to the IWWCM menu structure. So, LiveWorks! has now switched over to RSS 2.0 and apart from an implausible date due to some weird leaps around time zones and an article publishing a day early, it’s feed validates.

This isn’t the first time I’ve cursed Atom for this either - at least for me, it just seems so natural to follow the simple iterator pattern that IWWCM’s menus use so I’ve run into this a few times. Mostly it just takes an extra if statement or similar to special case the first item but every so often it requires some major reworking and in cases like this, it’s just about impossible to do.

That’s trade offs for you though. If that updated element wasn’t required, there’s a whole bunch of cool stuff on the consumer side that wouldn’t work. Oh well.

Rendering vs Editing

August 29th, 2008

Working in the world of editors there are a range of blog posts that float back up to the surface every so often, generally because they discuss an age old concern that just keeps resurfacing.  Recently a post from Moxiecode resurfaced complaining about the lack of focus on editing support in browsers. There’s been a few such posts in the past that I’ve seen and while the world of contenteditable support has definitely improved lately, it’s still one of the weakest areas of modern browsers.

Why is this not higher on the agenda? Behind most websites are some form of Content Management System, and most of them have some form of WYSIWYG editor for handling normal text content, if the tools we have where better and the bugs fewer, these systems would produce better code, and in the long run improve the web as a whole.

The primary reason this isn’t a bigger focus comes down to the basic ratios of participation - the vast majority of people only read web content, they don’t create it.  So browser vendors inherently get more bang for their buck by focussing on the rendering of content rather than the creation of it. Even with the explosion of blogs, Web 2.0 and user generated content, the percentages of people who create anything more complex than plain text with paragraphs is amazingly small compared to the number of browser users. When you build editors though, you don’t see that kind of difference - a much larger percentage of intranet users contribute content, and the same is true of many other areas where editors are used (blogs, forums, wikis etc all have vastly higher contribution rates than the web as a whole).

That said, user demand isn’t the only contributing factor. The fact is, rendering content is a completely different technical challenge to editing it. It requires different skills, different engineering approaches and a different understanding of users. Probably the biggest technical difference is that editing exposes far more of your document model than rendering does.

Think about what happens when you render HTML (very roughly) - first your parse the HTML text into a DOM and mix in all the external stuff like CSS and scripts etc so that you have a model that you can render. Then you use the information in that to render text and images on the screen. In the background you have a tree structure with a whole swag of complex attributes, but none of that actually shows on screen. When you look at a web page you can’t tell the difference between a site that uses plain paragraphs and one that nests those paragraphs in a whole bunch of DIV tags. It would never occur to you that a paragraph that contains some plain text and some bold text is modelled as a bunch of elements and a small sub-tree, it’s just a line of text.

When you’re editing though, the user is actually manipulating the DOM, but users still don’t want to think about any of that stuff. They don’t want to split a text element into three pieces and make the middle one a child of a new “b” element, they just want to select the text and apply a bold attribute to it. The way most users think about HTML pages is fundamentally different to the way the DOM is actually modelled and it’s up to the editor to make HTML work the way the user thinks while still creating valid, semantic HTML.

So if there’s a fundamental difference in the way that users think and the way that the content is modelled you have two choices - change the model or throw a whole lot of code at it to map between the two models. Programming tend to choice the second option by default, but changing the model is in general far more effective. That of course is a problem for browser vendors because the model they have is actually pretty ideal for rendering the content so changing it isn’t a particularly viable option. Thus, for a browser to support editing really well they have to throw a huge amount of code at the problem. There are an endless supply of corner cases that you have to consider and take care of so there always seems to be a ton more work to do. The only saving grace is that a lot of this work is handled by the JavaScript code that implements the editor - the browser generally just provides the underlying building blocks but it’s still a huge amount of work.

When you think through the amount of effort required and the number of affected users, it really shouldn’t come as a surprise that editing support is the neglected step child for browsers. It’s just a matter of return on investment. Fortunately, the demand side is increasing rapidly and the general improvements to JavaScript speed are making it more viable to move a lot of this handling out of the browser code and into JavaScript. There are surprisingly few fundamental types of changes you can make to a HTML DOM but there is an awful lot of code involved in deciding exactly which of them to apply when. The more of that code that is moved to being browser independent the better JavaScript editors will get because they’ll have more control over their own destiny.

java.net.URL Timeouts

August 26th, 2008

Sylvain Wallez:

If your application uses java.net.URL, and chances it does are very high, and you are using Sun's JVM (since 1.4.2), you should set the sun.net.client.defaultConnectTimeout and sun.net.client.defaultReadTimeout system properties to a reasonable value. Otherwise, if a remote site hangs, your application or server will also hang.

Useful to know…

MathML in Web Pages

August 19th, 2008

Dear Lazyweb,

Since this worked so well to find a great article on HTTP caching… Does anyone know of a good introductory article for how to get MathML to display in Web Pages across multiple browsers etc.

My primary recommendation unfortunately will be to convert it to an image, but I’d like to provide instructions for the folk who want to maintain accessibility for their equations as well. Related and also interesting is anything discussing mime-types, XHTML and how to solve the IE problem.

Thanks,

Faithful believer in collective intelligence…

Exporting and Importing a Portal WCM Library

August 18th, 2008

I’m going to need this soon and I’ll never find the link again in the IBM forums so I’m putting it here.

Exporting and Importing a Web Content Library

It should let you move web content (minus drafts and previous versions unfortunately) from one IWWCM server to another.