Wiki Syntax Considered Harmful

July 19th, 2005

Wikis were invented to make it easier for people to contribute content.  They do this with two key features:

  1. In browser editing.
  2. Users don't have to learn HTML.

That sounds awfully familiar.  It is in fact two of the key things that Ephox provided as we helped create the in-browser WYSIWYG editor industry which is now a key feature of most content management systems, knowledge management systems, document management systems and most other enterprise systems that people want to encourage contribution to (or reduce the complaints from people who are forced to contribute).  As a side note, I'm not sure whether the inventor of EditLive! had heard of wikis or not when he started his work and I'm not sure which was created first but it really doesn't matter.

Second side note: This article talks about problems which Ephox products are explicitly designed to solve as well as problems which they just happen to solve.  It is thus unavoidable that I recommend you look at products that Ephox make and other competing products.  I do strongly recommend you review your requirements and the available options though, Ephox makes great products but we don't even attempt to cater for every situation.  Find out what you want then find something that delivers it.  A while back I posted an article that mentions a few things to look for that people don't often think about.  If you do decide to look into WYSIWYG editors I suggest reading it first (bearing in mind of course that I'm biased).  Please try to avoid dismissing the issues below though just because I happen to work for a company that sells one possible solution.

The key point here is that making content contribution simple isn't limited to Wiki's, it is in fact reality in a wide range of systems that are traditionally thought of as difficult to use.  That's not to say that Wiki's aren't useful - they are light-weight systems that are much cheaper and simpler to deploy than CMS solutions and they encourage (and often force) the configuration to be more open, user-friendly and efficient for content contributors.

Learning Any New Syntax Is A Barrier To Acceptance

Wiki's have a major failing though: their syntax.  Every wiki seems to use a slightly different syntax or interprets the syntax slightly differently.  Worse, users don't have to learn HTML but instead they have to learn this wiki syntax.  With the ubiquity of HTML these days it's surprising how many people how a basic grasp of it, but pretty much everyone has to start from scratch learning the new wiki syntax.  It's a barrier to content contribution.

Is wiki syntax easier to learn that HTML?  Sure.  Does that mean it's not a barrier to content contribution? Nope, it's just less of a barrier.  I never cease to be amazed at how small a barrier it takes to stop people from adopting new technology.  If people were that timid about everything in their lives they'd never do anything.  Think about how many barriers you pushed through when you first learnt to cook, first learnt to walk, first learnt to ride a bike, first learnt to talk, first learnt to drive, first learnt to read, write, do maths, first learnt to interact socially, first learn to write a resumé, first anything.  Yet for some reason technology is big and scary and to be avoided if you have the slightest excuse.  So we have to remove barriers wherever we can because otherwise we give people and excuse to give up and they will almost certainly take it.

So how do we remove the barrier to entry of learning a new wiki syntax?  Use something they already know.  Make it look like Microsoft Word - it doesn't have to be exact, but provide a bold button, italic button etc just like Word has and most importantly, make it WYSIWYG.  There's no reason that users should have to learn any markup language in this day and age and they won't like it if you try to make them.  Geeks probably won't mind learning the syntax and may in fact prefer it which is why we see wikis being so successful with opensource projects.  Don't fall into the trap of thinking that geeks are normal - we're not and other people won't want to learn and remember any syntax regardless of how simple it seems.  They want to be able to see what they're getting and they want their existing knowledge of editing text in computers to carry over to this new system.

Avoid Wiki Syntax Lock-In

There's a much less obvious problem with wiki syntax too - it's a form of lock-in.  Usually it's not an intentional form of lock-in but it is none-the-less a form of lock-in.  You start off using one wiki with it's proprietary syntax and life is good but then you find that a different wiki now has features that you'd really like to use.  What are the two major barriers to switching?

  1. Training all your users on the new wiki syntax.  Great way to kill passion and buy-in for the system.
  2. Migrating all your existing data to the new syntax.

The wiki syntax is probably clearly documented and it's almost certainly possible to write a script to convert the content over, but that doesn't help your user training problem.  Besides which, don't you have better things to do with your time?  The other problem is that there's unlikely to be a direct one-to-one mapping between the two markup languages so you have to come up with ways to deal with the ambiguity or just deal with the fact that the conversion process isn't perfect.

A bunch of people have noticed this problem and have come up with proposed "standard formats" like Markdown.  These are a great idea and neatly solve both our user training and data migration problems.  I'd like to propose a simpler solution though:

Use HTML

Now why on earth would HTML be an option, considering one of our original key objectives was that user's didn't have to learn HTML?  HTML is an excellent option because we don't want our users to have to learn any markup language, we use HTML to output the final page anyway, it's a very widely adopted standard with lots of tools to work with it and most importantly, there are a large selection of existing in-browser WYSIWYG editors available for it.

I don't currently know of a WYSIWYG in-browser editor for any of the "standardized wiki markup languages" and I'd be surprised if they were particularly good yet even if they did exist.  Developing a great editor takes a lot of work so why not leverage the ones that already exist.  There are a range of free and opensource in-browser WYSIWYG HTML editors that may well meet your needs.  If you later find that they aren't up to scratch for whatever reason you can upgrade to one of the commercial options without any data migration problems.

One catch here though, I'd suggest you find an editor that creates at least well formed HTML and preferably standards compliant HTML (XHTML might be a good option).  It probably doesn't matter too much if you use <b> and <i> tags instead of <strong> and <em> even if the latest standards have deprecated them.  The main thing you want to avoid is tag soup and ambiguous HTML.  There are very few editors that don't manage to output reasonable HTML even if it's not great but it is worth checking, ambiguous, poorly formed HTML is just as much a form of lock-in as Wiki syntax and Word documents are.  The better editors can be configured to output fully standards compliant XHTML (in fact, most can).

Take some time to configure the editor to best suit your users and their particular needs - most editors are highly configurable so take advantage of that and your users will thank you.  Take the time to make local image uploads work with your wiki (if desired) or disable the feature.  Integrate the ability to browse existing images and hyperlinks from the editor and make sure that any CSS that will be applied to the output page are also applied in the editor so the user really does get a WYSIWYG experience.  If you want users to concentrate on content instead of display, turn off any unwanted formatting options and use the editors CSS support for marking up the content so that styles apply - since the editor displays the style they won't feel like they are creating ugly, boring documents because it will immediately look like the final, beautiful output (your CSS styles do make the content look good right?).

That sounds like a lot of work but it's probably less than you think and it dramatically improves the user's experience with the system and increases the chance that they'll actually use it regularly.

What about cross site scripting hacks and other nasty things people can do if you let them use HTML?  It's no more of a problem now than it was before - you had to handle those problems when you were using wiki syntax anyway so you still have the same problem.  I'd suggest running the HTML through Tidy to ensure it's well-formed and that all tags are closed then strip out script tags and anything else you don't want to include (any URLs that use the javascript: protocol for a start, but you were doing that already weren't you?).  As long as you get rid of JavaScript nasties you don't have to worry about format corruption because this is a wiki and if I wanted to corrupt your layout I'd just delete everything anyway.

There is one major drawback to using HTML as the format - diffs are hard.  I suggest doing the diff on a plain text version of the page (hint: if you used XHTML, you can use a simple XSLT to create this, otherwise Tidy, and probably a whole bunch of other tools, are your friend).  This has the added advantage of letting you skip over formatting only changes.  Alternatively there are XML diff tools out there which can do a good job and are very flexible.  With a little effort you can probably create a much better diff than the standard wiki diffs - think something closer to how Word's track changes displays the changes.

So when you're out shopping for a wiki implementation, consider the impact of the wiki syntax it uses and weigh up the advantages and disadvantages of using HTML and a WYSIWYG editor instead of wiki syntax.  Don't just blindly jump in without realizing the impacts.

If you're a wiki developer, make it a priority to support HTML syntax for your pages and provide hooks to add a WYSIWYG editor or better yet provide one out of the box.  Wiki's are about removing barriers to entry and now that we have easily available technology to remove the barrier of learning the syntax, lets make wikis even easier.

Why Big Media Will Dominate Podcasting

July 19th, 2005

I probably should title this post, Why Big Media Is Dominating Podcasting, because I suspect that there are already more listeners of "big media" podcasts than there are of "little guy" podcasts.  I don't have figures to back that up though so lets not worry too much about what the current state is and look to the future - will podcasting stay true to it's roots and be a way for the "little guy" to have his say or will big media take over?

It's actually not a simple yes/no question.  I'm fairly convinced that if podcasting isn't just a fad that big media will wind up dominating it, but that doesn't mean they will take over, just that they will be the most visible and have the majority of listeners.  The thing about podcasting is that you don't need the support of others to do it, you just record some stuff, put it on a web server a voila, you're a podcaster. The little guy can never be forced out of podcasting because she is in complete control of her distribution chain to the consumer.

Having said that, I think big media will dominate podcasting because it has better production capabilities, bigger advertising budget and enough money to handle the actual server load required to push out the content.  There's still plenty of people who choose to listen to the radio despite the fact that they have an in-car CD player and access to as many podcasts as they might want.  There's a reason indie music hasn't taken over the world - it's a niche market.  Believe it or not, some people actually like mainstream music - not just that they're brainwashed or stupid - they actually like it.  In fact, mainstream music is mainstream because it captures the biggest audience share.

Podcasting will wind up adhering to the same distribution characteristics.  There will be a large mainstream group which big media will cater for and make a lot of money from, and there will be a long tail which a wide range of little guy's will cater for and some of them will make some money from it. Some niches will be too small too make money off but people will cater for them as a hobby.  You know, it will work just like the music industry does today, with a mainstream group and a long tail of indie performers.  The difference will be that the ubiquity of the internet will make it easier to get to whatever market you're going after so that more little niches along the long tail will be viable and more people will be able to access that hobbyist's podcast about their favorite topic instead of wishing their was a podcast that they were actually interested in.

Of course as more people start podcasting the real challenge will be sorting the wheat from the chaff - that's already difficult.

Hint To Advocates

July 19th, 2005

Scoble pointed to a random blog posting of someone saying they prefer Windows over Mac OS as he's paid to do.  Usually I skip them but for some reason I read this one.  It really struck me as odd that in a post that's meant to be pro-Windows, it's really more about how to get by with Windows:

There are a few things to remember about windows. Turn on automatic updates and put everything on a broadband connection behind a router. They can be picked up for about $40 bucks. Don't install every crappy shareware program or file sharing software that comes along .

Snip a paragraph on how bad Windows was before Win2K.

Another weakness in Windows is Internet Explorer and Active X and porn/warez. These sites beg you to install their free software and a machine becomes infested with spyware/adware/malware. Use Firefox and avoid such websites.

So the basic message seems to be, PCs are good because they're cheap - but don't buy the cheap stuff or you'll have problems, buy the good quality stuff.  Then watch out for all the pitfalls of using Windows and constantly worry about getting your computer infected.

Tell me again why this is a better value proposition?  Perhaps your time isn't worth much, but I tend to value mine highly.

So just remember, next time you want to explain why you made a particular decision, how about talking about the positive things that influenced your decision instead of explaining all the stuff you decided you'd put up with?  Or perhaps just hide your random comments about nothing from Scoble so he doesn't try to make out they're meant to be a killer argument for why you should use Windows.

 

Why Tagging Isn’t The Answer

July 19th, 2005

A while back, Benjamin commented about a problem his parents had organizing photos:

Watching my mother trying to use Windows XP to locate her holiday snaps makes it clear to me that tagging is the right way to interact with personal documents. The traditional "one file, one location" filesystem is old and busted. The scenareo begins with my mother learning how to take pictures from her camera and put them into foldlers. Unfortunately, my father is still the one managing short movie files. The two users have different mental models for the data. They have different filing systems. Mum wants to find files by date, or by major event. Data thinks that movie files are different to static images and that they should end up in different places. The net result is that Mum needs to learn how to use the search feature in order to find her file, and is lucky to find what she is looking for.

The idea of using tagging to solve this problem is fundamentally flawed - tagging is merely a different way for Mum and Dad to represent their mental models in the computer.  The mental models are still different however, so having tags doesn't solve the problem.  Instead of wondering which folder the file was put in, Mum now wonders which tag it was filed under.  Since tags aren't hierarchical, it's harder to narrow down the number of possibilities either.

The ability for the file to be in two places at once is a big advantage, but only in that once you find the file you can put it somewhere you're more likely to find it again in the future without disturbing other users.  When the file is initially imported by someone with a different mental model, it probably still won't be where you would have put it and thus won't be where you first look for it.  You still need to use the search feature to find the file.

The other problem with tags in this particular case is that most or all of the data Benjamin mentions is already captured by the file system as metadata.  Dates and types of files are already automatically stored (both in the file system and most digital cameras put the date in JPEG images too), it's just not simple and efficient to use this metadata with most current OS's.  The major event name could be entered in comments for the file (if the file system supports it), the containing folder or the image name.  This is the one piece of data that is likely to benefit from user extensible metadata of which tagging is a primitive form (it would be better to support adding a specific metadata field called "Event Name" or similar rather than just supporting the specification of a tag).

I find it particularly interesting that tagging was presented as a solution in this case where the essential problem is a difference of mental models between users when the biggest problem facing current implementation of tags is bridging the gap between the mental models of different users.  Should this blog entry be tagged as "tags", "Tagging" or "tag"?  What other tags should be applied?  No structure gives maximum flexibility but also causes the maximum difficult in finding things again later.

In this particular case, I'd suggest getting some good software for managing photo libraries that suits their needs.  Most photo library software supports some form of tagging, as well as providing the ability to have images in multiple albums at once and the ability to specify comments etc.  Most importantly an interface is generally provided to be able to utilize that data by searching, sorting and providing different views.  An OS wide solution might have more potential across the whole of your computing experience (and other benefits), but a domain specific solution is both readily available today and will almost certainly provide a better user experience and better functionality for the specific problem at hand.

There's been a lot of talk about tagging lately and it definitely has it's uses but it is one of the most primitive forms of metadata so why it's seen as solving all problems is somewhat beyond me.  The most common example is being able to group files related to a specific project together by using a common tag.  I just don't see how this is easier or better than just creating a folder for that project and putting everything in there.  Tags are more useful for cross-cutting concerns (like with Aspect Oriented Programming).  Group related things together using existing techniques and then use metadata to cut across groups and find related material.  There's a lot of power in metadata but it's not found by throwing out everything we already have (ala GMail), it's found by augmenting existing techniques and providing extra power and flexibility in the combination of approaches.

Anyway, I'm a long, long way off from Benjamin's post now so I'll stop.