Google Wars

March 22nd, 2004

It appears the war of the Adrian Sutton’s is hotting up on Google. (Hint for those that just read the RSS feeds: take a look at the main page) The once unstoppable Professor Adrian Sutton who ruled supreme as number one search result for “Adrian Sutton” has dropped significantly down to third place, though he now has two entries in the top five with his surprise appearance in some meeting minutes.

The new kid on the block Adrian Sutton has roared up the charts to take the number one spot just ahead of my own Randomness which held the top spot less than two days ago. I also hold the fifth spot with my appearance in a CVS commit message for FreeCard

Stay tuned as this pathetically geeky race continues to unfold!

URL Escaping is Evil

March 22nd, 2004

I have come to the conclusion that URL escaping is evil and must be banished from the face of the earth. I’ve got no idea how it manages to work at all - every implementation seems to be different and the support for different character sets is a major hit and miss affair.

Take for instance the string:

© Adrian Sutton

It looks like a pretty simple string and all. It should be encoded as:

%C2%A9%20Adrian%20Sutton

assuming UTF-8 character encoding (and I literally mean assuming since there’s no possible way to know for sure). If however you were to use the javascript escape() function you could get any one of:

%u00A9+Adrian+Sutton

%C2%A9%20Adrian+Sutton

%u00A9%29Adrian%20Sutton

It’s impossible to tell if the + sign in the first two is an encoded space or an actual plus sign (there’s no requirement for + to be escaped in URIs so many implementations leave it as is). Then you have to deal with the rather odd %u00A9 syntax which seems to be half URI escaping, half HTML entity and finally you get to worry about which character set was in use.

For the record, here’s what your browser makes of it: