Symphonious

Living in a state of accord.

java.net.URL or java.net.URI?

I'm going to show my ignorance of the actual differences between URLs and URIs here, but I was a bit surprised by the fact that java.net.URL didn't extend from java.net.URI. Along those lines URL.toURI() suggests that some URLs can't be converted to URIs. Is this just in the context of the URLs that Java previously successfully parsed or is this a generic constraint?

My main reason for asking is I'm trying to determine, when implementing an HTTP caching library, I should be using java.net.URI or java.net.URL to identify the source of resources. My original thought was to use java.net.URL because that's what pretty much everything uses and besides I'm used to support Java 1.3 (URI was only added in Java 1.4). Somehow that seemed overly specific and I should use URI instead. Partly this is because it felt more generic and partly because it just seemed to have more geek cred. I must admit though I really don't have a firm grasp on which I should use when and why. So dear lazy web, can you help?

  • Tim Bray says:

    I’d say URI is the only sane choice. See http://www.tbray.org/ongoing/When/200x/2003/02/27/URL and http://www.w3.org/TR/webarch and for a nasty surprise, check out the sickening broken-ness of java.net.URL.equals()

    March 29, 2007 at 10:32 pm
  • Steve Loughran says:

    Do not go near java.net.URL if you can help it. In some recently fixed releases of Java, it would hang the thread in the Constructor until a reverse DNS lookup of the proxy server timed out.

    March 29, 2007 at 10:59 pm
  • James Snell says:

    As Tim and Steve point out, java.net.URL is evil. java.net.URI is not a whole lot better but it’s still the best choice.

    March 30, 2007 at 12:46 am
  • David Venz says:

    The above comments are probably of more immediate use, but as another data-point:
    Have you read RFC 3305, in particular section 2?

    Cheers,
    -Dave.

    March 30, 2007 at 6:30 am
  • Adrian Sutton says:

    Ok, so out of all of that I think my understanding was fairly right, if uncertain. URLs are a specific type of URI, every URL should be a valid URI and Java just screws it all up. Fair enough.

    From a practical standpoint though I’m still in two minds – java.net.URL may be horribly broken, but it is what everyone uses. It seems unlikely that a HTTP caching algorithm would be useful for things that aren’t identified by URLs (exactly how do I send a HTTP request for urn:isbn:xxxx-xxxx-xxxx?) so I probably should just use them. I can always add a URI interface as well later if it ever proves useful.

    March 30, 2007 at 8:18 am
  • Asbjørn Ulsberg says:

    What are you going to do with the URIs/URLs in your application? If you’re going to resolve them and have to rely on the URL class itself to do that, then that might make sense in a very obscure way. However, I think that’s just as wrong as having readFile() and writeFile() methods on the String class. It just doesn’t belong there, and if it was there, the class would be badly designed and imho broken. Thus, the URL class is broken.

    If what you want is a generic class to represent the URI syntax with convenient methods to handle this syntax (e.g. extract portions of the URI for further inspection etc.), and leave resolving to a class better equipped to do it, then URI is definately the way to go. As Tim Bray writes, don’t go near the URL class, at least not unless you have explicit, well-thought and very sound reasons to do it.

    March 30, 2007 at 6:10 pm
  • Adrian Sutton says:

    Well, I’ve been successfully working with the URL class for at least the past 5 years – as much as anything out of necessity because that’s what everything tends to use. All I really need in this case is an identifier – some other code will have to deal with retrieving resources, I just need a unique name to assign them so that they can be stored and retrieved in the cache, given the design of HTTP the obvious answer is a URL/URI. It is likely to be more useful for clients of the API to use URLs but it is easier to convert URLs to URIs than visa versa and given the advice that the URL class is so broken I’ve gone with java.net.URI. No doubt I’ll need to add utility methods that take URLs and convert them to URIs at some point, but this will let me keep moving forward and see if this becomes a worthwhile project or not.

    March 30, 2007 at 6:29 pm
  • Asbjørn Ulsberg says:

    Sounds like a smart pick and plan to me. When you say you’ve worked successfully with the URL class for at least 5 years, have you in any of those 5 years done url1.equals(url2) and not had unexpected results?

    March 30, 2007 at 11:49 pm
  • Anonymous says:

    Never. However, the only reason I’ve compared URLs before is for really primitive caching, and the URLs would have been constructed from the exact same String. Getting an unexpected false wouldn’t matter either. So while I’ve done an awful lot with HTTP, my usage of URLs has really just been a storage container – it could have been a String for all I really cared (and in many cases, it was a String).

    March 31, 2007 at 8:20 am

Your email address will not be published. Required fields are marked *

*