Ampersand Redux
Adrian Sutton
It seems I wasn’t clear enough with my ampersand related comments. I’m not talking about standards here, the standards are very clear – & should always be escaped as &, no ifs no buts. However, we live in the real world and many things don’t follow standards correctly. So while David is correct that the validator will complain if you don’t escape ampersands in HTML documents, some browsers will get it wrong if you do escape them in some cases (it’s exotic and the actual test cases are at work not here unfortunately). In XHTML however, you really seriously have to escape them because a) browsers get it right when kicked into XHTML mode, and b) XML parsers barf if you don’t. Byron also chimes in with a comment:
Neither, actually. A space is not a valid character in a URI, and an ampersand is not a valid character in an attribute value. You should use: Which is mostly true though pedantic (Yes you should escape spaces like that, no it’s not what I was talking about). However, it is very much incorrect to say that “ampersand is not a valid character in an attribute value”. Ampersand is in fact a perfectly valid character in attribute values, otherwise it would be impossible to link to:
mt.cgi?__mode=view&_type=entry&blogid=2 which is the page I'm currently on. What Byron meant to say was that when serializing an attribute value that contains an ampersand, the ampersand must be represented using an entity such as &. The distinction is largely just being pedantic, but it is important if you ever work with a DOM as & is resolved to & by the time you see it (though technically it doesn't have to be as it could be left as an
Entity element. As a side note, the main reason an entity would be put into a DOM as an Entity element instead of being resolved to a character is when the character set used for strings in the DOM doesn’t support that character. You don’t generally run into that problem with Java-based XML parsers because Java uses Unicode to store strings, thus any character that can be represented in an XML entity can be represented directly as a character in a Java string. There are exceptions and complications to this though and there’s differences between Java 1.5 and Java 1.4 in this regard as well. The XHTML standard has a comment about the use of ampersands in URLs too (it says to escape them despite the fact that you could usually get away with not doing it in HTML).