URL Escaping is Evil
I have come to the conclusion that URL escaping is evil and must be banished from the face of the earth. I’ve got no idea how it manages to work at all - every implementation seems to be different and the support for different character sets is a major hit and miss affair.
Take for instance the string:
© Adrian Sutton
It looks like a pretty simple string and all. It should be encoded as:
%C2%A9%20Adrian%20Sutton
assuming UTF-8 character encoding (and I literally mean assuming since there’s no possible way to know for sure). If however you were to use the javascript escape() function you could get any one of:
%u00A9+Adrian+Sutton
%C2%A9%20Adrian+Sutton
%u00A9%29Adrian%20Sutton
It’s impossible to tell if the + sign in the first two is an encoded space or an actual plus sign (there’s no requirement for + to be escaped in URIs so many implementations leave it as is). Then you have to deal with the rather odd %u00A9 syntax which seems to be half URI escaping, half HTML entity and finally you get to worry about which character set was in use.
For the record, here’s what your browser makes of it:

March 31st, 2004 at 5:09 pm
which is of course something else again :-)
%A9%20Adrian%20Sutton
June 19th, 2007 at 8:17 pm
>there’s no requirement for + to be escaped in URIs
You are not right
http://www.ietf.org/rfc/rfc2396.txt
reserved = “;” | “/” | “?” | “:” | “@” | “&” | “=” | “+” |
“$” | “,”
you MUST enconde the ‘+’ sign!
June 19th, 2007 at 8:35 pm
Eugen,
It’s entirely academic - quite a few implementations *don’t* escape + in URLs so you have no idea if it’s a + or an incorrectly encoded space. The majority of the time, + means space in a URL but there are exceptions.