Symphonious

Living in a state of accord.

Wiki Syntax Considered Harmful

Wikis were invented to make it easier for people to contribute content.  They do this with two key features:

  1. In browser editing.
  2. Users don't have to learn HTML.

That sounds awfully familiar.  It is in fact two of the key things that Ephox provided as we helped create the in-browser WYSIWYG editor industry which is now a key feature of most content management systems, knowledge management systems, document management systems and most other enterprise systems that people want to encourage contribution to (or reduce the complaints from people who are forced to contribute).  As a side note, I'm not sure whether the inventor of EditLive! had heard of wikis or not when he started his work and I'm not sure which was created first but it really doesn't matter.

Second side note: This article talks about problems which Ephox products are explicitly designed to solve as well as problems which they just happen to solve.  It is thus unavoidable that I recommend you look at products that Ephox make and other competing products.  I do strongly recommend you review your requirements and the available options though, Ephox makes great products but we don't even attempt to cater for every situation.  Find out what you want then find something that delivers it.  A while back I posted an article that mentions a few things to look for that people don't often think about.  If you do decide to look into WYSIWYG editors I suggest reading it first (bearing in mind of course that I'm biased).  Please try to avoid dismissing the issues below though just because I happen to work for a company that sells one possible solution.

The key point here is that making content contribution simple isn't limited to Wiki's, it is in fact reality in a wide range of systems that are traditionally thought of as difficult to use.  That's not to say that Wiki's aren't useful – they are light-weight systems that are much cheaper and simpler to deploy than CMS solutions and they encourage (and often force) the configuration to be more open, user-friendly and efficient for content contributors.

Learning Any New Syntax Is A Barrier To Acceptance

Wiki's have a major failing though: their syntax.  Every wiki seems to use a slightly different syntax or interprets the syntax slightly differently.  Worse, users don't have to learn HTML but instead they have to learn this wiki syntax.  With the ubiquity of HTML these days it's surprising how many people how a basic grasp of it, but pretty much everyone has to start from scratch learning the new wiki syntax.  It's a barrier to content contribution.

Is wiki syntax easier to learn that HTML?  Sure.  Does that mean it's not a barrier to content contribution? Nope, it's just less of a barrier.  I never cease to be amazed at how small a barrier it takes to stop people from adopting new technology.  If people were that timid about everything in their lives they'd never do anything.  Think about how many barriers you pushed through when you first learnt to cook, first learnt to walk, first learnt to ride a bike, first learnt to talk, first learnt to drive, first learnt to read, write, do maths, first learnt to interact socially, first learn to write a resumé, first anything.  Yet for some reason technology is big and scary and to be avoided if you have the slightest excuse.  So we have to remove barriers wherever we can because otherwise we give people and excuse to give up and they will almost certainly take it.

So how do we remove the barrier to entry of learning a new wiki syntax?  Use something they already know.  Make it look like Microsoft Word – it doesn't have to be exact, but provide a bold button, italic button etc just like Word has and most importantly, make it WYSIWYG.  There's no reason that users should have to learn any markup language in this day and age and they won't like it if you try to make them.  Geeks probably won't mind learning the syntax and may in fact prefer it which is why we see wikis being so successful with opensource projects.  Don't fall into the trap of thinking that geeks are normal – we're not and other people won't want to learn and remember any syntax regardless of how simple it seems.  They want to be able to see what they're getting and they want their existing knowledge of editing text in computers to carry over to this new system.

Avoid Wiki Syntax Lock-In

There's a much less obvious problem with wiki syntax too – it's a form of lock-in.  Usually it's not an intentional form of lock-in but it is none-the-less a form of lock-in.  You start off using one wiki with it's proprietary syntax and life is good but then you find that a different wiki now has features that you'd really like to use.  What are the two major barriers to switching?

  1. Training all your users on the new wiki syntax.  Great way to kill passion and buy-in for the system.
  2. Migrating all your existing data to the new syntax.

The wiki syntax is probably clearly documented and it's almost certainly possible to write a script to convert the content over, but that doesn't help your user training problem.  Besides which, don't you have better things to do with your time?  The other problem is that there's unlikely to be a direct one-to-one mapping between the two markup languages so you have to come up with ways to deal with the ambiguity or just deal with the fact that the conversion process isn't perfect.

A bunch of people have noticed this problem and have come up with proposed "standard formats" like Markdown.  These are a great idea and neatly solve both our user training and data migration problems.  I'd like to propose a simpler solution though:

Use HTML

Now why on earth would HTML be an option, considering one of our original key objectives was that user's didn't have to learn HTML?  HTML is an excellent option because we don't want our users to have to learn any markup language, we use HTML to output the final page anyway, it's a very widely adopted standard with lots of tools to work with it and most importantly, there are a large selection of existing in-browser WYSIWYG editors available for it.

I don't currently know of a WYSIWYG in-browser editor for any of the "standardized wiki markup languages" and I'd be surprised if they were particularly good yet even if they did exist.  Developing a great editor takes a lot of work so why not leverage the ones that already exist.  There are a range of free and opensource in-browser WYSIWYG HTML editors that may well meet your needs.  If you later find that they aren't up to scratch for whatever reason you can upgrade to one of the commercial options without any data migration problems.

One catch here though, I'd suggest you find an editor that creates at least well formed HTML and preferably standards compliant HTML (XHTML might be a good option).  It probably doesn't matter too much if you use <b> and <i> tags instead of <strong> and <em> even if the latest standards have deprecated them.  The main thing you want to avoid is tag soup and ambiguous HTML.  There are very few editors that don't manage to output reasonable HTML even if it's not great but it is worth checking, ambiguous, poorly formed HTML is just as much a form of lock-in as Wiki syntax and Word documents are.  The better editors can be configured to output fully standards compliant XHTML (in fact, most can).

Take some time to configure the editor to best suit your users and their particular needs – most editors are highly configurable so take advantage of that and your users will thank you.  Take the time to make local image uploads work with your wiki (if desired) or disable the feature.  Integrate the ability to browse existing images and hyperlinks from the editor and make sure that any CSS that will be applied to the output page are also applied in the editor so the user really does get a WYSIWYG experience.  If you want users to concentrate on content instead of display, turn off any unwanted formatting options and use the editors CSS support for marking up the content so that styles apply – since the editor displays the style they won't feel like they are creating ugly, boring documents because it will immediately look like the final, beautiful output (your CSS styles do make the content look good right?).

That sounds like a lot of work but it's probably less than you think and it dramatically improves the user's experience with the system and increases the chance that they'll actually use it regularly.

What about cross site scripting hacks and other nasty things people can do if you let them use HTML?  It's no more of a problem now than it was before – you had to handle those problems when you were using wiki syntax anyway so you still have the same problem.  I'd suggest running the HTML through Tidy to ensure it's well-formed and that all tags are closed then strip out script tags and anything else you don't want to include (any URLs that use the javascript: protocol for a start, but you were doing that already weren't you?).  As long as you get rid of JavaScript nasties you don't have to worry about format corruption because this is a wiki and if I wanted to corrupt your layout I'd just delete everything anyway.

There is one major drawback to using HTML as the format – diffs are hard.  I suggest doing the diff on a plain text version of the page (hint: if you used XHTML, you can use a simple XSLT to create this, otherwise Tidy, and probably a whole bunch of other tools, are your friend).  This has the added advantage of letting you skip over formatting only changes.  Alternatively there are XML diff tools out there which can do a good job and are very flexible.  With a little effort you can probably create a much better diff than the standard wiki diffs – think something closer to how Word's track changes displays the changes.

So when you're out shopping for a wiki implementation, consider the impact of the wiki syntax it uses and weigh up the advantages and disadvantages of using HTML and a WYSIWYG editor instead of wiki syntax.  Don't just blindly jump in without realizing the impacts.

If you're a wiki developer, make it a priority to support HTML syntax for your pages and provide hooks to add a WYSIWYG editor or better yet provide one out of the box.  Wiki's are about removing barriers to entry and now that we have easily available technology to remove the barrier of learning the syntax, lets make wikis even easier.