Why Are People So Bad At Applying For Jobs?

June 15th, 2006

We're looking for some new hires and I just can't believe how many people don't know how to write even a half decent job application. How difficult is it to realize that you have to:

  1. Include a cover letter.
  2. Customize the cover letter so it is specific to the job you're applying for.
  3. Explain how you meet some of the key requirements in your cover letter and
  4. Customize your resume at least a little to highlight the most relevant things for the job you're applying to.

Applying to a million different jobs with a crappy application is far less likely to get you a job than applying to two or three with a really good application. Take the time to get it right and you'll find a job so much quicker and easier.

Providing Feedback To The Business

June 14th, 2006

Feedback is a key concept in XP and it's common for engineers to focus on feedback from the business in terms of what features are required, how to make things better etc. It is important however not to forget that feedback to the business is just as important

The business needs to know how things are going - to see that progress is being made and what the team is working on. To that end, we've started providing a whole series of pretty graphs showing our velocity on each project, the number of open bugs and a few graphs showing basic code quality metrics. It remains to be see how successful this will be but hopefully it will help provide some vital communication from engineering back to the rest of the business without needing an all-hands meeting every week.

One of the key steps that I hope will make this work, is that every graph includes a description of what the graph shows and what trends mean things are going well. So the code complexity graph explains that the graph should trend gradually downwards over time1.

I'll be interested to watch this project and find out how useful it is to the rest of the business as a quick way of checking in on engineering. Naturally, it doesn't substitute good old fashioned conversations and the other communication that goes on within the organization but it does provide a more general overview than conversations tend to give.

 

1 - That will probably need to change to say stay fairly constant once we get code quality down to an acceptable level.

Diffing HTML

June 13th, 2006

I think this is the final episode in my series of responses1 to Alastair's Responding to Adrian. What's the aim of diffing HTML, how hard is it and how do you go about it?

The aim is really important to identify. The most common and most useful aim that I see for diffing HTML is to be able to show users what changed between two versions of a document. Since the management of most content is centralized2, this equates to showing the combined changes in each version between the original version to compare and the final version to compare. If you've ever wanted to see what's changed on a wiki page, you've wanted this type of diff. If you're sending Word documents back and forth between people you probably want this type of diff too.

Another common and useful type of diff is where two parties have made changes to the same base version and you want to compare them so they can be merged into a single document with both changes. This is useful when you don't have a centralized repository controlling things, or if that repository allows concurrent changes.

The third use of diff is to store only changes to a document in order to save space, be that on the file system or network bandwidth. This is the one form which has absolutely no need for human readability.

It's also worth noting that diffing HTML is quite different to diffing generic XML. HTML is a document format and the type of content it generally contains is very natural language intensive. There are many XML formats that have similar attributes3, but also a lot of XML formats that don't4. For content that isn't natural language intensive, diffing in any of these use cases comes down to the classic computer science techniques - adding, removing and moving elements, adding, removing and changing attributes, changing element content etc. However, for natural language based content, the changes to the XML structure are far less important than the changes to the actual textual content.

It may well be that I've missed something, and if so please let me know, but despite a reasonable amount of searching I've never seen a diff tool for any format that can handle natural language well. For the aim of showing changes between document versions, the most important thing is that the diff output clearly shows the intent of changes, not the effect of them. Line based diff obviously doesn't cut it here as a single character spelling correction shouldn't cause the entire line to be marked as different. This kind of change is where Word's track changes shines - it can mark that one character change as a single character change while still marking a change from "the" to "tear" as a word change instead of removed h and an inserted a and r. It is complexities like these that make natural language diffing so hard.

XML structural diffing on the other hand is a fairly well solved problem as long as you just need to know that the content changed, rather than clearly what was changed. Docucomp for instance has very good diffing tools for HTML, but it is largely ineffective at showing the intent of changes to natural language.

In general, I agree with Alastair's comments on diffing, but he seems to be looking mostly at the second two use cases5 whereas I mostly focus on the first case since that's about the only use case I have for diff. I also agree that Word has some significant limitations and bugs in its track changes implementation, but as a technology concept it does show the potential for tracking changes as they happen instead of diffing after the fact as a way of showing changes between two versions of a document to humans. A complete implementation of track changes6 would enable the other two use cases but with more effort than a post-edit diff.

In the end, it really comes down to what your use case is.

 

1 - Previous responses: On The Importance Of Rendering Fidelity, The Invisible Formatting Tag Problem and the original The Challenge Of Intuitive WYSIWYG HTML

2 - at least in the area I work in where CMSs and similar systems are everywhere

3 - Docbook and DITA come to mind

4 - eg: the data from a form that is stored as XML

5 - which explains why three way diff is a hard requirement

6 - Word can't track structural changes to tables

The Invisible Formatting Tag Problem

June 12th, 2006

Continuing with my response to Alastair's Responding to Adrian, let's take a look into the problem with invisible formatting tags in WYSIWYG editors.

The example I gave was to delete the final character of some hyperlinked text. The delete operation removes the internal formatting tag, and hence removes the hyperlink entirely, as well as the final character.

In this behaviour Outlook is no different to many other HTML editors, and is a completely appropriate example. The problem of the invisible formatting goes directly to the heart of the limitations of WYSIWYG editors. There is no visual representation of the </a> tag, so there are bound to be some surprises when the user starts editing in the vicinity of the tag.

It seems like a logical assumption to assume that because is marked up with various tags and that we're so used to thinking in a DOM model, that invisible formatting tags are an inherent limitation of WYSIWYG editors. The truth however is quite different. It's actually possible to write a WYSIWYG editors which doesn't use invisible formatting tags that the user can inadvertently delete.

The Swing text APIs are an excellent example of this. The general document model that Swing uses is actually generic across many different formats1 and has no invisible formatting tags anywhere in the document. The document model is based around a character array containing all the text in the document. On top of that, an element tree is layered. The elements have a start and end offset within the content and their start and end offsets automatically adjust to accomodate changes to the document content. This means that the user can delete any character within a hyperlink and the element representing the hyperlink simply adjust its offsets so that it cover one less character. When the user deletes the last character of the hyperlink the hyperlink element is removed.

At no stage in this process are there extra characters inserted into the content array to track the positions of elements, it is purely done with offsets in the text. There are a number of user actions which require changes to the element tree beyond just adjusting offsets and these need to be handled specifically. Fortunately, the vast majority of these cases are handled with a few simple rules, leading to simple implementation and an intuitive and predictable editing experience for the end user.

The assumption that because HTML requires a </a> tag, HTML editors have to include a </a> tag in their model is simply false. No editor I know of does this, including Outlook. Office tends to use the end of a paragraph as the key point for tracking formatting2 so there can be unexpected effects when the user backspaces through that point. Even with a bad model, there is no reason that Microsoft couldn't change that behavior to something else - they have simply decided that that is the way it should work or that changing it is not yet a high priority.

In fact, the Outlook example is a perfect showcase of this - hyperlinks work differently to bold, italic and underline. This actually shows clearly that the problem isn't because of an invisible formatting tag, otherwise the same problem would occur with the </b> tag, the </i> tag etc. The hyperlink behavior is actually a specific choice made by the Outlook team3. I mentioned in my first response one possible reason the Outlook team might have decided to do this:

Regarding the hyperlink complaint, that's most likely because Outlook automatically applies hyperlinks when you type an URL - this annoys a lot of people so they made it easy to remove the hyperlink again, by hitting backspace at the end of the hyperlink.

No one who has experience writing editors will claim that it is easy to make a WYSIWYG editor intuitive for users no matter what format your editing. The fact is that editing content is hard. There are an infinite number of states your program could be in4 and a vast number of user actions to handle. Identifying all the possible things that a user wants to do and how they are going to try to do it is really difficult, but that doesn't mean that WYSIWYG editors are flawed, it simply means it will take a lot of work and dedication to getting it right. Even when there are parts of the editor that don't work perfectly, for the vast majority of users a WYSIWYG editor is better5 than having to learn and use a markup language.

Beyond Outlook, there are plenty of other examples of HTML being represented using a different syntax - the DOM model, binary XML and a range of other XML serialization formats which HTML could be represented as. There is simply no reason an editor needs to deal with the limitations and difficulties of the HTML serialization format, it can come up with any model it deems fit to provide the best experience for end users.

Hidden formatting tags are simply a bad idea from the early days of WYSIWYG editors that has hung around because Microsoft don't want to break backwards compatibility in Office. They are not required, nor are they acceptable in any modern WYSIWYG editor.

 

1 - HTML, RTF and plain text support are provided by default but developers can plug in their own implementations for other formats.

2 - whether or not this is actually an invisible tag or simply a behavior that Microsoft has chosen to implement I couldn't be sure

3 - A bad one I'll agree, but certainly not a limitation of the editor

4 - every possible document a user could create is a new state

5 - better = easier to use, easier to learn, more intuitive and all round "nicer" to work with

On The Importance Of Rendering Fidelity

June 11th, 2006

A while back I promised I would get around to fully responding to Alastair's Responding to Adrian, sadly I'm finding lots of little bits of time to blog but not enough time to reply to the whole post at once. Hence, I'll have to respond in parts when I get time.

First up, the problem that HTML doesn't render the same on different systems. My assertion is that generally the differences aren't significant enough to worry about.

This is a pretty broad, and in general wrong, assumption. In my article I provided an (admittedly rather contrived) example where the reader selects boldface type for display, completely obscuring the use of boldface type that the writer had chosen for emphasis.

Firstly I'm not sure I've ever seen a user choose to make all text bold, nor can I imagine a case where it would help - larger text definitely, but not bold. That said, there are cases where things will render differently for good reason - screen readers for example render the text as spoken words, rather hard to display bold. Sadly, it's also rather hard to tell the difference between CITE and italic or even normal text and STRONG. Braille devices suffer the same problems, there's just no way to display emphasis so that markup is ignored. Should then we all stop using any formatting in our documents because braille doesn't provide a way to represent it?

The solution is to ensure that the textual content carries the intended message and formatting is used to make it simpler to understand as opposed to making it possible to understand.

Despite being at least vaguely aware of the possible differences that can arise between my system and yours, on too many occasions I have managed to shoot myself in the foot. Like picking an unusual unicode character. Or my various ham-fisted CSS experiments. The point being that I often have no idea about how this site will look on other people’s systems, and it sometimes does make a difference. A big difference.

A WYSIWYG editor has absolutely no impact on whether or not you choose to use unusual unicode characters or mess up your CSS. The fact that these things don't render correctly in some browsers is because of a bug in the browser1. It is not something you can anticipate with great knowledge of the HTML and CSS standards, it is something you have to find out by testing on all browsers. Which brings us to:

Testing that there are no differences between your editing environment and certain common viewing environments is of course laudable. Extending this to assert that there are no such differences for anyone, seems a bit, well, naive.

Testing in different environments is the only way to be sure that there are no differences. You can use semantic markup all you want and follow the standards to the letter but still have your site not render correctly in any browser2. This also ties back to my comments on Content Authoring vs Site Design. When you're designing a site you will use complex CSS and pull fancy tricks that are likely to cause problems, when you are authoring content, you really just want to type text, insert a few hyperlinks, a table, a list, an image and some basic formatting like bold and italic3.

…one of my objections to WYSIWYG for editing HTML is that it reinforces the illusion that What You See on your browser is What I Intended You To See. It emphasises a single possible representation of the HTML, and disregards all of the others, thus encouraging the inexpert user to believe that there is only one possible representation of their document. In this respect HTML differs from other uses of WYSIWYG, such as in a word processor.

While it seems logical to conclude that have a WYSIWYG view of the document you're editing makes people inclined to believe that there is only one possible rendering, that doesn't seem to be the expectations that users actually come up with. Firstly remember that I work in the work of content editing, not site design. As such, the content that people create with a WYSIWYG editor is merged with other content, has different CSS applied to it and output to multiple different formats (HTML, PDF, Print, etc). This post looks quite different in the WYSIWYG editor compared to its final output format on my blog, not because the HTML and CSS are rendered differently but because a header, footer, sidebar and other blog posts are all added into the final page.

Besides that, even if users begin with the expectation that there is only one possible view of their content, they very quickly learn otherwise when they view it in a browser. Even when the editor uses the same rendering engine as the browser4, the final output looks different because all the extra editing markup is now gone. Things like paragraph markers, dotted gray lines to indicate the boundaries of tables with no border and other rendering all make the content look different in edit mode compared to design mode.

WYSIWYG editors are about making it more intuitive for users to create their content, not about making users be more pedantic about how things look. It avoids users having to learn some foreign syntax and debugging errors in it when they mistype something.

[M]ost people tend to use the I tag instead of EM because they want italic and not emphasis. Any decent HTML editor will use (or at least have an option to use) EM when the user clicks the italic button, thus preserving intent and displaying correctly in nearly every situation.

I’m going to assume that he meant “button” or some UI widget instead of “tag”. By definition WYSIWYG editors do not encourage users to edit tags.

No actually, I meant the I tag. If you make users write out HTML by hand, they will tend to use the wrong tags for things. In most user's minds, they want to make text italic, they don't want to apply emphasis because that's a foreign technique to them. It is a little odd that this occurs when the reason they want to apply italic is for emphasis but italic has been in use in writing for so long that the logical jump has become in-grained into most people. The advantage with a WYSIWYG editors is that you can reverse that mapping for them, when the press the italic button, the editor inserts an EM tag.

There's also no reason that a WYSIWYG editor can't let users apply semantic markup when available. For instance, I can apply different heading levels and CSS styles right from my editor, I can use the correct list type, plus differentiate between indenting and blockquote. I can use the CITE, the EM or the I tag as I see fit, or I can change the editor's configuration and limit what the user can do. For example, I might want to limit the user to just applying CSS styles, so I can remove all the other formatting items off the menu bar. Then all the formatting done by the user will be using only CSS styles that I've predefined. I can even change the rendering of different styles to make them visually distinctive in the editor even when they aren't in the final rendered page.

In short, there's no reason that you can't generate perfectly good, semantic markup using a WYSIWYG editor - you just have to find a good one, configure it correctly and train users on what semantic markup is and why they should use it. The training you need to do anyway, but at least you don't have to teach them the foreign syntax language of HTML and CSS.

My Turn For Snarkiness

At this point I can't resist pausing to point out that the link to Alastair's previous ponderings demonstrates quite nicely, the kind of syntax error that WYSIWYG editors were invented to avoid. When typing the link Alastair forgot to being with '/' and thus the URL was treated as relative instead of root relative and clicking the link results in a 404. Of course, with my fancy WYSIWYG editor I can search through my previous posts for specific text, select the one I want and create a link to it correctly every single time.

I should also point out that the problem with tag rendering that Alistair pointed out in my previous posting was actually caused by my hand written code that integrates the editor and not the editor itself. The editor correctly put in an entity reference but in the back end processing to save the entry it got interpreted back to the less than character. I'll have to get around to fixing that at some point.

1 - For instance IE has a bug which causes it to not render the left hook arrow I use in these footnotes correctly. The HTML however is perfectly valid so it should render everywhere.

2 - In fact, the more you ahere to the latest and greatest HTML and CSS standards the less likely your site will render correctly. While CSS might be better in a lot of ways, if you want browser compatibility table based layouts are still the safest bet.

3 - Which any decent WYSIWYG editor should apply as STRONG and EM tags

4 - As in the case of JavaScript based editors