Ampersand Redux
It seems I wasn’t clear enough with my ampersand related comments. I’m not talking about standards here, the standards are very clear - & should always be escaped as &, no ifs no buts. However, we live in the real world and many things don’t follow standards correctly.
So while David is correct that the validator will complain if you don’t escape ampersands in HTML documents, some browsers will get it wrong if you do escape them in some cases (it’s exotic and the actual test cases are at work not here unfortunately). In XHTML however, you really seriously have to escape them because a) browsers get it right when kicked into XHTML mode, and b) XML parsers barf if you don’t.
Byron also chimes in with a comment:
Neither, actually. A space is not a valid character in a URI, and an ampersand is not a valid character in an attribute value.
You should use:
<a href=”Me%20&%20You”>
Which is mostly true though pedantic (Yes you should escape spaces like that, no it’s not what I was talking about). However, it is very much incorrect to say that “ampersand is not a valid character in an attribute value”. Ampersand is in fact a perfectly valid character in attribute values, otherwise it would be impossible to link to:
mt.cgi?__mode=view&_type=entry&blogid=2
which is the page I’m currently on. What Byron meant to say was that when serializing an attribute value that contains an ampersand, the ampersand must be represented using an entity such as &. The distinction is largely just being pedantic, but it is important if you ever work with a DOM as & is resolved to & by the time you see it (though technically it doesn’t have to be as it could be left as an Entity element.
As a side note, the main reason an entity would be put into a DOM as an Entity element instead of being resolved to a character is when the character set used for strings in the DOM doesn’t support that character. You don’t generally run into that problem with Java-based XML parsers because Java uses Unicode to store strings, thus any character that can be represented in an XML entity can be represented directly as a character in a Java string. There are exceptions and complications to this though and there’s differences between Java 1.5 and Java 1.4 in this regard as well.
The XHTML standard has a comment about the use of ampersands in URLs too (it says to escape them despite the fact that you could usually get away with not doing it in HTML).

September 10th, 2004 at 10:03 am
Yes, an ampersand is valid as part of an attribute value (as represented in an HTML document) where that ampersand is part of an entity reference. An ampersand that is not part of an entity reference is not valid in an attribute value, in an HTML document.
Serialization has nothing to do with it, since an HTML document is not the serialization of a DOM tree, although it can be viewed as such. I did not mean to say anything about serializing attribute values, I meant to say that an attribute value in an HTML document cannot legally have an ampersand that is not part of an entity reference.
If your document does have such an ampersand, it will not validate. It might work in current browsers, but down the road it might not. Don’t do it. If a browser gets it wrong, file a bug against the browser or avoid ampersands entirely, don’t force every other author of HTML parsers to work around your markup’s faults.