String Interning and Threads
Anton Tagunov added an excellent comment to yesterday’s entry:
The one thing that has always stopped me from doing this was: interning must use some global lock.
So, while interning would cause little to no harm for desktop application it is likely to introduce extra synchronization bottleneck for server applications.
Thus potentially degrading perfromance on some multi-cpu beast.
Would you agree with this, Adrian?
Firstly, let me make something very clear: string interning will cause bugs in your code and they will be hard to track down - they will however be easy to fix once found. At some point, someone, somewhere will forget to intern a string and let it pass into your code that assumes strings are interned.
Also, string interning is an extremely situational optimization. In most cases, it will worsen performance because the overhead of interning the strings will not be made up for by the reduced complexity of comparisons. Even of the cases where it does help, most of the time the difference will not be noticeable. As always, don’t bother optimizing until you know that you need to and that it will help - this is an optimization that causes the code to become less maintainable.
Having said that, lets go back to the original question. Firstly, does string interning require synchronization? Probably, but not in terms of Java. The String.intern() method is a native method and works via JNI. It would be difficult to imagine a way of achieving the behavior without at least some synchronization though. The synchronized block however would be very small, and very rarely encountered.
There are two situations to consider, either the string is already in the interned list or the string is not. If it is, then no synchronization needs to occur because the list is only being read. So multiple strings can be interned at once so long as all of them are already in the interned list. Synchronization will be needed however whenever a string is interned for the first time (ie: it doesn’t match any String constant that has been loaded or any previously interned string).
So on a multiple CPU system, it would be very bad to intern a lot of strings that are only ever used once or twice as they would require a lot of synchronization for no benefit. Of course on a single CPU system, doing this would be a bad thing anyway because it would incur the extra cost of comparing strings to check if they match an interned string without gaining any real benefit.
My theory would then be (and only real world application profiling will confirm this in any particular situation) that the string interning technique is slightly less likely to pay off on multiple CPU systems, however because the situations in which string interning is useful require that the vast majority of String.intern() calls match something already in the cache (most likely one of the string constants they’re to be compared against) the question of how many CPUs will be in use isn’t going to have any significant impact.
I can’t stress enough though that if you don’t have specific profiling data that shows String comparisons as the biggest bottle neck in your application, you shouldn’t apply this optimization.
Great question.
UPDATE: Here’s an interesting discussion of interning relating specifically to this question. The automatic google search on the side (if you actually click through to this blog entry) is very handy at times.

August 17th, 2004 at 8:18 pm
Hi, Adrian! I’m trying hard not to make myself look too smart (it’s really flattering to hear you give such a high score to my lowly remarks :-).
And please excuse me for a long mail, I tried to say it in words but failed.
I can imagine only three major variations of cache: A, B and C.
My point is that we either
* have synchronization on cache read (A) *
or
* maintain per-thread local cache replicas (B, C) *
variants B and C look quite wasteful in term of memory
(or the don’t, hmm… Adrian?)
And the question is what does intern do?
Perhaps A?
CacheA
{
Map m_global = new HashMap();
get( key )
{
synchronized( m_global )
{
return m_global.get( key );
}
}
put( key, value )
{
synchronized( m_global )
{
m_global.put( key, value );]
}
}
}
CacheB
{
ThreadLocal m_threadLocal = new ThreadLocal();
Map m_global = new HashMap();
get( key )
{
Map localMap = m_local.get();
if ( localMap != null )
{
Object found = localMap.get( key );
if ( found != null )
{
return found;
}
}
synchronized( m_global )
{
Object found = m_global.get( key );
if ( found != null )
{
if ( localMap == null )
{
localMap = new HashMap();
m_local.put( localMap );
}
localMap.put( key, found );
}
return found;
}
}
put( key, value )
{
synchronized( m_global )
{
m_blobal.put( key, value );
}
Map localMap = m_local.get();
if ( localMap == null )
{
localMap = new HashMap();
m_local.set( localMap );
}
localMap.put( key, value );
}
}
CacheC
{
ThreadLocal m_threadLocal = new ThreadLocal();
Map m_global = new HashMap();
get( key )
{
Map localMap = m_local.get();
if ( localMap != null )
{
Object found = localMap.get( key );
if ( found != null )
{
return found;
}
}
synchronized( m_global )
{
if ( localMap == m_global )
{
return;
}
m_local.set( m_global );
return m_global.get( key ):
}
}
put( key, value )
{
synchronized( m_global )
{
m_global = m_global.clone();
m_global.put( key, value );
m_local.set( m_global );
}
}
}
August 17th, 2004 at 8:25 pm
yes, in my previous post I should have done
s/m_local/m_threadLocal/g
January 23rd, 2007 at 8:02 pm
[...] 2005 at 9:31 am and filed under Links. Bookmark the permalink. Follow any comments here with the RSS feed for this post. « 天下體 台灣大哥大行動秘書» [...]