Java’s Code is Available

March 26th, 2004

I noticed this on Java.net. In it, Eitan Suez suggests that having open source J2SE libraries would rapidly increase their quality because of all the developers contributing patches.

The major problem with this argument is that the source for the Java standard libraries is already available and in fact is included in pretty much every J2SDK. It’s right there, in a format that can be used to create patches against and submit them to the publicly available bug parade for java. How many people do you see actually doing this though? Some, but very few.

Sure people are going to be more inclined to submit patches if Java becomes buzz word compliant and gets a mickey mouse badge from the FSF, but there will still be far more people demanding improvements than there will be fixing them. Open source isn’t this magical bullet that suddenly makes life easy and gives perfect quality - there are plenty of extremely buggy opensource programs with too few developers and there always will be.

Maybe I’m just more careful in choosing which commercial software I use, but on average I’ve found opensource software to be lower quality than commercial software, however opensource is more towards the extremes - it’s either really great or really bad. I’m sure that’s going to start a massive flame war but so be it. Opensource isn’t always better quality because generalisations are always false.

Finally I’d like to draw attention to what I consider the best rebuttal I’ve heard in a long time:

Go open source with DB2 and then you can tell me what to do with my assets

From Scott McNeally obviously. Check and mate.

Character Encodings

March 26th, 2004

Jim Winstead has posted a couple of entries on character encodings (1, 2). Some good info in there.

My three big tips for dealing with character encodings is this:

1. Know your character encoding and make sure you’re using that encoding everywhere. No look again because you probably missed a place where you didn’t think about encoding. Particularly don’t forget that printing something to System.err or System.out in Java uses the platform default encoding and so characters that can’t be represented in that encoding become question marks.

2. When practical, use US-ASCII and escaped characters for anything outside of it’s range. Most formats which support different encodings also provide a way to represent a character which isn’t in the current encoding through something like HTML’s entities or java’s escape codes (\u8222 etc). Most encodings are compatible with US-ASCII (EBDIC being a notable exception) so even if people forget to use the right encoding they can generally get away with it.

3. Remember that character encodings, despite their name do not apply to characters - they apply to byte sequences which represent characters. If you have a char variable in Java it has no character encoding as far as you are concerned, it’s just that character. The JVM can choose any representation it likes for that character in physical memory and you shouldn’t care (it actually happens to choose UTF-16 I think but you still shouldn’t care). You *do* however have to worry about character encodings when you convert from characters (or Strings which are really just a fancy array of chars) to byte streams. This happens when you use String.getBytes(), or print the characters to any kind of output stream. You also have to worry about the reverse process, new String(byte[]) and reading from an input stream.

The first two items should be pretty clear to you if you’ve done any work with character encodings, the third may seem unimportant, but it will help stop you from expecting code like the following to work:


String str = new String("my string");
byte[] utf8Bytes = str.getBytes(”UTF-8″);
String strISO88591 = new String(utf8Bytes, “ISO-8859-1″);

Naturally this won’t work because of rules 1 and 3. Rule 1 is broken in that you used a different encoding when working with the same data and 3 is broken because you expected strISO88591 to be a String using ISO-8859-1 character encoding, but it doesn’t because String objects don’t have a character encoding (as far as you should be concerned).

The big exception to rule 3 is when you’re using a language which doesn’t guarantee it will support whatever characters you throw at it, in which case you basically either have to work only with byte arrays and never let the language string functions near them. In general though I’d suggest you find a better language or a better library.

If I were to add fourth suggestion, it would be: remember that just because your character is valid, doesn’t mean the font you’re using can display it. Most fonts can’t display anything more than the characters in ISO-8859-1 and a few select others so if you’re working with mathematical symbols or characters from other languages you’ll need to find a special font that supports them.

BTW, yes I have spent far too much time working with character encodings and tracking down where people stuffed up with character encodings.

He Wants An Apple

March 26th, 2004

Leo Simmons wants an apple. He suggests that someone buy a nice shiny new 17″ powerbook and send him their old ratty 15″ powerbook. I happen to have an old ratty 15″ powerbook and would love a nice shiny new 17″ powerbook - I can even justify it. I can’t however afford it. So let me offer some mac hater a chance to show just how much they hate Mac’s in three easy steps.

1. Buy a nice shiny new 17″ powerbook.
2. Give the nice shiny new 17″ powerbook to me.
3. There is no step three.

I will then quite happily give my old ratty 15″ powerbook to Leo and everyone will be happy. The mac hater can brag to all his friends about how he hates macs so much that he gave away a perfectly good 17″ powerbook to some total stranger because it’s totally worthless to him, I can enjoy my nice shiny new 17″ powerbook and Leo can enjoy his nice ratty old 15″ powerbook.