Just When You Thought It Was Safe…

August 22nd, 2004

Just when you thought it was safe to turn the TV on again, Young Talent Time makes a come back. The worst part is that the Minogue sisters have promised to appear on the show - any hope that some actual talent may be found is lost…

Amazon Goodness

August 18th, 2004

I have slightly obscure tastes in music - particularly, I like musicals, not the highlights CDs the full recording of the original cast. It’s certainly not the most obscure taste in music but it does lead to an awful lot of trouble tracking down what I want and worse still I know what I want ahead of time unlike most people with really obscure tastes who just stumble across things they like.

Coming back to the point though, you can’t just walk into HMV or pretty much any music store that I’ve found and pick up a copy of the original 1986 cast recording of The Phantom of The Opera or Miss Saigon or Les Miserables. However with this new fangled technology intarweb thingy I can head on over to Amazon and order it from there. There’s a bunch of other online stores around that may or may not have what I want for prices roughly equal to Amazon but what I love about Amazon is watching it try to predict my buying habits.

I know most people freak out about privacy violations when computer systems start gathering data about them but with Amazon it’s like a fun game. It managed to pick that I was looking to purchase Miss Saigon and The Phantom Of The Opera the last time I went there and offered a package deal on them. It’s recommendations are also really quite good - including detecting that I tend to buy the versions that Lea Salonga is in (she tends to be part of the original cast of a lot of big musicals). Sadly, it doesn’t seem to have worked out that I only buy CDs from them as it keeps offering books and sheet music. While I do tend to buy a fair bit of sheet music of musicals I won’t purchase it without first flicking through it to make sure I have a chance of being able to play it.

The big downside of buying from Amazon though is I have to wait two and a half weeks for things to arrive (that or pay an extra arm for postage).

That Pesky Caps Lock

August 18th, 2004

Tor Norbye politely requests that the caps lock key be removed and the control key put there instead. There’s one very good reason why that shouldn’t be done:

Everyone (except old school UNIX geeks) is used to the control key being where it is.

Moving the control key would seriously annoy people. If you’re one of the people who are used to control being next to ‘a’ then imagine the whole world being as annoyed as you every time they use a computer and find that control is in the “wrong” place.

More importantly though, putting control beside ‘a’ isn’t a good place anyway. The little finger is the most difficult finger to control on the human hand and is used least commonly. In touch typing, currently the left little finger is positioned over ‘a’ and moves up for ‘q’ and ‘z’. If you’re British or Australian, ‘q’ and ‘z’ are incredibly uncommon letters (American’s customized their language by putting a bunch of Zs in).

Now think of the most common keyboard shortcuts used on computers these days (think Windows users, not emacs users):

  • Control-Z
  • Control-X
  • Control-C
  • Control-V
  • Control-Y (Redo)

Apart from the crazy idea of making control-Y the shortcut for redo (it’s typically control-shift-z on Mac which is arguably slightly better - read on for why), all those shortcuts are in the bottom left corner of the keyboard. More importantly though, they form the “control home row”. Try this experiment, put your left hand on the standard home row (of a US-English QWERTY keyboard) and then reach your little finger down to the control key. Now, if you’re a contortionist, you’ll have kept your other fingers on the home row but it’s actually easier to slide your entire hand down and just slightly to the left, letting your little finger lead. Your fingers then wind up on control, z, x, c and your thumb below the space bar. Since you’ve lead with your little finger it will get there just slightly ahead of the other fingers so that you hit control first followed rapidly by whichever of those keys you wanted.

What this positioning means is that once you get used to control being where it is it’s actually quite fast even though it’s small mostly because it’s a more natural position for your hand. Note that Fitts law only applies to computer interfaces and not strictly to the real world (it does apply if all other factors are equal).

Having said that, I prefer the use of the command key (positioned where the alt key is on windows keyboards or where the meta key is on Sun keyboards) because it allows my free thumb to turn under (the way it often does when playing piano) and my fingers can then drop down and hit the particular key I want. Alternatively though (and this is why I really like it) I can also stretch out my fingers and hit any key within about three quarters of the keyboard effortlessly (I have particularly long fingers). Also, because it’s the thumb that reaches for the meta key all the other fingers are left exactly where they are for touch typing. I can actually touch type with my thumb tucked under like that with only marginally lower speed.

It is interesting to note however that if I plug a windows keyboard into my Mac, I am constantly reaching for the control key instead of the apple key. Similarly if I plug a Mac keyboard into a Windows box. It seems there is something about the feel of the keyboard that my fingers have learnt to identify with and use the appropriate modifier key. I’m yet to be able to identify exactly what that attribute is. Judging from that I’d say the most important aspect of speed when using meta keys is what you’re used to. Perhaps some scientific tests are required….

Pointless Schemas

August 17th, 2004

There seems to be a growing trend for projects to use XML configuration files - fine.
There seems to be a growing trend for those projects to provide a schema for those files - good.
There seems to be a growing trend for those projects never to validate their configuration files against the schema - bad.

As I’ve previously mentioned, my job involves creating an XML forms editor and it turns out that this forms editor is really quite good at editing configuration files (see our very own configuration tool).

We thought it might be nice to create a simple editor for Maven POM files. Sadly, it seems that the schema for a POM file doesn’t come anywhere near close to describing what should actually be in a POM. Maven has support for inheriting a POM and using the information in it, thus allowing that information to be omitted from the POM itself. Now admittedly, it’s not possible to specifically describe this in XML schema (the POM being inherited from can omit any information it likes as well assuming that it will be filled in by the extending POM), but the current schema insists that everything be specified in every POM file which makes validation completely useless.

The POM files from the plugins don’t validate against the schema for a number of reasons as well.

JDNC is even worse - Xerces finds errors in the schema itself (and I’m fairly sure Xerces is correct).

This, combined with fairly poor documentation, makes it extremely difficult to implement tool support. It’s a shame, I’d probably use Maven if it wasn’t such a pain in the neck to create the POM file correctly (that and dependency management which is wonderfully simple for things in the public repository and annoyingly difficult for things that aren’t). Maybe I’ll come back and create a more useful schema for Maven at some point, but updating the schema doesn’t really help me identify the areas of our product that need improving.

String Interning and Threads

August 17th, 2004

Anton Tagunov added an excellent comment to yesterday’s entry:

The one thing that has always stopped me from doing this was: interning must use some global lock.

So, while interning would cause little to no harm for desktop application it is likely to introduce extra synchronization bottleneck for server applications.

Thus potentially degrading perfromance on some multi-cpu beast.

Would you agree with this, Adrian?

Firstly, let me make something very clear: string interning will cause bugs in your code and they will be hard to track down - they will however be easy to fix once found. At some point, someone, somewhere will forget to intern a string and let it pass into your code that assumes strings are interned.

Also, string interning is an extremely situational optimization. In most cases, it will worsen performance because the overhead of interning the strings will not be made up for by the reduced complexity of comparisons. Even of the cases where it does help, most of the time the difference will not be noticeable. As always, don’t bother optimizing until you know that you need to and that it will help - this is an optimization that causes the code to become less maintainable.

Having said that, lets go back to the original question. Firstly, does string interning require synchronization? Probably, but not in terms of Java. The String.intern() method is a native method and works via JNI. It would be difficult to imagine a way of achieving the behavior without at least some synchronization though. The synchronized block however would be very small, and very rarely encountered.

There are two situations to consider, either the string is already in the interned list or the string is not. If it is, then no synchronization needs to occur because the list is only being read. So multiple strings can be interned at once so long as all of them are already in the interned list. Synchronization will be needed however whenever a string is interned for the first time (ie: it doesn’t match any String constant that has been loaded or any previously interned string).

So on a multiple CPU system, it would be very bad to intern a lot of strings that are only ever used once or twice as they would require a lot of synchronization for no benefit. Of course on a single CPU system, doing this would be a bad thing anyway because it would incur the extra cost of comparing strings to check if they match an interned string without gaining any real benefit.

My theory would then be (and only real world application profiling will confirm this in any particular situation) that the string interning technique is slightly less likely to pay off on multiple CPU systems, however because the situations in which string interning is useful require that the vast majority of String.intern() calls match something already in the cache (most likely one of the string constants they’re to be compared against) the question of how many CPUs will be in use isn’t going to have any significant impact.

I can’t stress enough though that if you don’t have specific profiling data that shows String comparisons as the biggest bottle neck in your application, you shouldn’t apply this optimization.

Great question.

UPDATE: Here’s an interesting discussion of interning relating specifically to this question. The automatic google search on the side (if you actually click through to this blog entry) is very handy at times.