Symphonious

Living in a state of accord.

Wiki Syntax Considered Harmful

Wikis were invented to make it easier for people to contribute content.  They do this with two key features:

  1. In browser editing.
  2. Users don't have to learn HTML.

That sounds awfully familiar.  It is in fact two of the key things that Ephox provided as we helped create the in-browser WYSIWYG editor industry which is now a key feature of most content management systems, knowledge management systems, document management systems and most other enterprise systems that people want to encourage contribution to (or reduce the complaints from people who are forced to contribute).  As a side note, I'm not sure whether the inventor of EditLive! had heard of wikis or not when he started his work and I'm not sure which was created first but it really doesn't matter.

Second side note: This article talks about problems which Ephox products are explicitly designed to solve as well as problems which they just happen to solve.  It is thus unavoidable that I recommend you look at products that Ephox make and other competing products.  I do strongly recommend you review your requirements and the available options though, Ephox makes great products but we don't even attempt to cater for every situation.  Find out what you want then find something that delivers it.  A while back I posted an article that mentions a few things to look for that people don't often think about.  If you do decide to look into WYSIWYG editors I suggest reading it first (bearing in mind of course that I'm biased).  Please try to avoid dismissing the issues below though just because I happen to work for a company that sells one possible solution.

The key point here is that making content contribution simple isn't limited to Wiki's, it is in fact reality in a wide range of systems that are traditionally thought of as difficult to use.  That's not to say that Wiki's aren't useful – they are light-weight systems that are much cheaper and simpler to deploy than CMS solutions and they encourage (and often force) the configuration to be more open, user-friendly and efficient for content contributors.

Learning Any New Syntax Is A Barrier To Acceptance

Wiki's have a major failing though: their syntax.  Every wiki seems to use a slightly different syntax or interprets the syntax slightly differently.  Worse, users don't have to learn HTML but instead they have to learn this wiki syntax.  With the ubiquity of HTML these days it's surprising how many people how a basic grasp of it, but pretty much everyone has to start from scratch learning the new wiki syntax.  It's a barrier to content contribution.

Is wiki syntax easier to learn that HTML?  Sure.  Does that mean it's not a barrier to content contribution? Nope, it's just less of a barrier.  I never cease to be amazed at how small a barrier it takes to stop people from adopting new technology.  If people were that timid about everything in their lives they'd never do anything.  Think about how many barriers you pushed through when you first learnt to cook, first learnt to walk, first learnt to ride a bike, first learnt to talk, first learnt to drive, first learnt to read, write, do maths, first learnt to interact socially, first learn to write a resumé, first anything.  Yet for some reason technology is big and scary and to be avoided if you have the slightest excuse.  So we have to remove barriers wherever we can because otherwise we give people and excuse to give up and they will almost certainly take it.

So how do we remove the barrier to entry of learning a new wiki syntax?  Use something they already know.  Make it look like Microsoft Word – it doesn't have to be exact, but provide a bold button, italic button etc just like Word has and most importantly, make it WYSIWYG.  There's no reason that users should have to learn any markup language in this day and age and they won't like it if you try to make them.  Geeks probably won't mind learning the syntax and may in fact prefer it which is why we see wikis being so successful with opensource projects.  Don't fall into the trap of thinking that geeks are normal – we're not and other people won't want to learn and remember any syntax regardless of how simple it seems.  They want to be able to see what they're getting and they want their existing knowledge of editing text in computers to carry over to this new system.

Avoid Wiki Syntax Lock-In

There's a much less obvious problem with wiki syntax too – it's a form of lock-in.  Usually it's not an intentional form of lock-in but it is none-the-less a form of lock-in.  You start off using one wiki with it's proprietary syntax and life is good but then you find that a different wiki now has features that you'd really like to use.  What are the two major barriers to switching?

  1. Training all your users on the new wiki syntax.  Great way to kill passion and buy-in for the system.
  2. Migrating all your existing data to the new syntax.

The wiki syntax is probably clearly documented and it's almost certainly possible to write a script to convert the content over, but that doesn't help your user training problem.  Besides which, don't you have better things to do with your time?  The other problem is that there's unlikely to be a direct one-to-one mapping between the two markup languages so you have to come up with ways to deal with the ambiguity or just deal with the fact that the conversion process isn't perfect.

A bunch of people have noticed this problem and have come up with proposed "standard formats" like Markdown.  These are a great idea and neatly solve both our user training and data migration problems.  I'd like to propose a simpler solution though:

Use HTML

Now why on earth would HTML be an option, considering one of our original key objectives was that user's didn't have to learn HTML?  HTML is an excellent option because we don't want our users to have to learn any markup language, we use HTML to output the final page anyway, it's a very widely adopted standard with lots of tools to work with it and most importantly, there are a large selection of existing in-browser WYSIWYG editors available for it.

I don't currently know of a WYSIWYG in-browser editor for any of the "standardized wiki markup languages" and I'd be surprised if they were particularly good yet even if they did exist.  Developing a great editor takes a lot of work so why not leverage the ones that already exist.  There are a range of free and opensource in-browser WYSIWYG HTML editors that may well meet your needs.  If you later find that they aren't up to scratch for whatever reason you can upgrade to one of the commercial options without any data migration problems.

One catch here though, I'd suggest you find an editor that creates at least well formed HTML and preferably standards compliant HTML (XHTML might be a good option).  It probably doesn't matter too much if you use <b> and <i> tags instead of <strong> and <em> even if the latest standards have deprecated them.  The main thing you want to avoid is tag soup and ambiguous HTML.  There are very few editors that don't manage to output reasonable HTML even if it's not great but it is worth checking, ambiguous, poorly formed HTML is just as much a form of lock-in as Wiki syntax and Word documents are.  The better editors can be configured to output fully standards compliant XHTML (in fact, most can).

Take some time to configure the editor to best suit your users and their particular needs – most editors are highly configurable so take advantage of that and your users will thank you.  Take the time to make local image uploads work with your wiki (if desired) or disable the feature.  Integrate the ability to browse existing images and hyperlinks from the editor and make sure that any CSS that will be applied to the output page are also applied in the editor so the user really does get a WYSIWYG experience.  If you want users to concentrate on content instead of display, turn off any unwanted formatting options and use the editors CSS support for marking up the content so that styles apply – since the editor displays the style they won't feel like they are creating ugly, boring documents because it will immediately look like the final, beautiful output (your CSS styles do make the content look good right?).

That sounds like a lot of work but it's probably less than you think and it dramatically improves the user's experience with the system and increases the chance that they'll actually use it regularly.

What about cross site scripting hacks and other nasty things people can do if you let them use HTML?  It's no more of a problem now than it was before – you had to handle those problems when you were using wiki syntax anyway so you still have the same problem.  I'd suggest running the HTML through Tidy to ensure it's well-formed and that all tags are closed then strip out script tags and anything else you don't want to include (any URLs that use the javascript: protocol for a start, but you were doing that already weren't you?).  As long as you get rid of JavaScript nasties you don't have to worry about format corruption because this is a wiki and if I wanted to corrupt your layout I'd just delete everything anyway.

There is one major drawback to using HTML as the format – diffs are hard.  I suggest doing the diff on a plain text version of the page (hint: if you used XHTML, you can use a simple XSLT to create this, otherwise Tidy, and probably a whole bunch of other tools, are your friend).  This has the added advantage of letting you skip over formatting only changes.  Alternatively there are XML diff tools out there which can do a good job and are very flexible.  With a little effort you can probably create a much better diff than the standard wiki diffs – think something closer to how Word's track changes displays the changes.

So when you're out shopping for a wiki implementation, consider the impact of the wiki syntax it uses and weigh up the advantages and disadvantages of using HTML and a WYSIWYG editor instead of wiki syntax.  Don't just blindly jump in without realizing the impacts.

If you're a wiki developer, make it a priority to support HTML syntax for your pages and provide hooks to add a WYSIWYG editor or better yet provide one out of the box.  Wiki's are about removing barriers to entry and now that we have easily available technology to remove the barrier of learning the syntax, lets make wikis even easier.

  • Bruno Dumon says:

    Can’t resist to mention our (open-source) Daisy content management system (see http://cocoondev.org/daisy/index.html), in which we’ve also opted for HTML-editing since our target users include non-technical types (and besides, it’s just much more comfortable). We’re using the editors available in IE/Firefox since depending on commercial components isn’t an option. These editors are not perfect but mostly good enough.

    We solved the problem of text-based diffing using a custom (server-side) “HTML cleaner” component which gives a byte-for-byte equal result whether editing happend in IE or Firefox (the HTML these produce can be quite different) or as source. This includes doing things like switching to a new paragraph when encountering two br’s etc. See for example here:
    http://cocoondev.org/daisy/index/version/6/diff?otherVersion=7

    July 19, 2005 at 10:34 pm
  • aj says:

    Wow, that looks great! I’ve been meaning to implement a better diff for our internal wiki for a while now so I may well base it off of the Daisy diff tools. It could also be useful in our test scripts….

    Thanks!

    July 20, 2005 at 8:11 am
  • Byron Ellacott says:

    One of the key points of a Wiki syntax is not that it’s easier to create than HTML, rather that it’s easier to read than HTML. WYSIWYG solves both problems, of course, but it’s important to remember that when saying HTML isn’t that much harder to learn. It’s substantially harder to read, unless the author is being meticulous about layout and correct use of tags.

    Also, it’s worth noting that a Wiki syntax doesn’t have anywhere close to the same level of complexity for preventing cross site scripting. Simply escape the three evil characters anywhere they occur in the text, and you’re done. No need to worry about whether the user is trying to indicate that something is less than another thing or if they’re trying to open an HTML element, or which tags and attributes are OK and which are not. Again, a WYSIWYG editor can take care of this for you, I expect.

    WYSIWYG editors have plenty of upsides, but the downside is, frankly, most of ‘em suck to use. Maybe the ones Ephox creates are better… but “This system does not meet the minimum requirements to run EditLive! for Java. Now using a textarea instead.” The fallback position for a WYSIWYG editor is generally substantially worse than any Wiki syntax, or even hand edited HTML, because generated HTML is rarely readable.

    July 20, 2005 at 9:27 am
  • aj says:

    Generated HTML from Ephox editors and many other modern WYSIWYG editors is exceptionally readable. In fact, it’s usually more readable than handwritten code. Take a look at the source code for this page – the entry was created in EditLive! for Java. It’s indented cleanly and contains the bare minimum tags required to markup the content. You can paste in content from Word and have it come out nearly that clean even though the HTML Word generates is the most awful mess you’ve ever seen.

    Also, I’m not sure what browser you’re using, we support the widest range of browsers and OSs of any WYSIWYG editor that I know of so it’s unfortunate that you don’t use a supported browser. For our particular market that’s not an issue because we tend to sell to the enterprise where they use a standardized browser or a small number of browsers and platforms.

    It’s really important to stress that users should never have to see raw markup in any form. If they do you have a problem with your system. It should be displayed and edited as WYSIWYG, so being harder or easier to read is totally irrelevant.

    You’re right that Wiki syntax is easier to clean of cross site scripting attacks, but there are existing tools to strip JavaScript from HTML so it’s not that big a deal. It also depends on your target market – JavaScript is explicitly allowed on our internal wiki and used to great effect but that’s with a controlled set of users who get fired if they try to break our systems – social control rather than technical. If you have a publicly available wiki you need to be more concerned about it but it’s still not that difficult. I’m sure the Daisy developers have already confronted this problem, just take a look at their code.

    By the way, while a WYSIWYG editor can probably be configured to prevent a user entering JavaScript you shouldn’t rely on this because I could POST a JavaScript containing document directly to the server and by-pass the inherently client-side editor. Never rely on client side validation.

    Finally, there are a lot of crap WYSIWYG editors out there but for non-geek users (ie: not you and probably not any of the readers of this site) they are almost all better than having to edit any syntax by hand. Besides which there’s so much choice out there that you should be able to find one that suits your users. Don’t forget to spend the time configuring them too!

    July 20, 2005 at 11:07 am
  • Byron Ellacott says:

    I’m using Firefox on OS X. I would expect that I don’t have whatever particular java toy is needed. By and large I agree with you, the point I poorly attempted to raise is that WYSIWYG tools aren’t guaranteed to run on even reasonably modern systems, and a good fallback is essential. The Ephox demo I tried out certainly didn’t have that: the contents of the textarea was jumbled up HTML.

    July 20, 2005 at 1:35 pm
  • aj says:

    Firefox on OS X is the bane of my existence. The Java support in FireFox is generally woeful but on OS X is just absolutely abysmal. Fortunately, enterprise isn’t currently using OS X much nor are they using FireFox much so it’s pretty rare that we run into that particular problem.

    Falling back to HTML markup really isn’t much worse than falling back to wiki syntax in a few rare cases, and if it were really a problem you could fallback to a FireFox specific WYSIWYG editor (ie: the built-in features that FireFox has). Yes that would take more effort, but the benefit to users and the improved adoption of the system should justify it for a heck of a lot of cases. Of course it probably pays to run the content through Tidy to clean it up if it’s come in from hand-editing so that it’s cleaner for users to see (apparently unlike with our demos).

    July 20, 2005 at 2:20 pm
  • Pingback/Trackback

    …. » links for 2006-05-08

Your email address will not be published. Required fields are marked *

*