Profiling and Optimization

Recently there has been a very interesting and at times heated discussion
about optimization on the HUMBUG
mailing list. It’s worth reading the href="http://lists.humbug.org.au/archives/general/2004-April/thread.html">
archives (look for the “Which is better?” thread) to get the whole
picture.

There were two main topics being discussed:

  1. Whether you should look for optimizations as you code and implement
    them if they don’t affect readability and
  2. Whether or not memory allocation in Java is slow and thus efforts
    should be taken to avoid it.

Issue 1 - Premature Optimization

Inside The Loop or Outside The Loop?

Most people argued that premature optimization is the root of all evil,
and that in Java particularly it’s a waste of time because you almost
always predict what the compiler and the JITC are going to do
incorrectly. There were a couple of people (one in particular) who
thought that optimizations that didn’t affect readability were a good
idea. The example:

for (int i = 0; i < 1000; i++) {
    SimpleDateFormat format = new SimpleDateFormat("YYYY-MM-DD");
    format.format(new Date());
}

was bandied around as an example of something you should optimize by
moving the creation of the SimpleDateFormat out of the loop.  The
speed increase of doing so is very significant and so you should move it
out of the loop, however there are two possible reasons why you would
move it:

  • Because it’s faster outside the loop.
  • Because the scope of the formatter extends across all iterations of the
    loop and not just a single iteration.

It’s for the second reason that I would move the SimpleDateFormat outside
of the loop and not the first.  This is highlighted nicely by one of
the other examples that floated around:

for (int i = 0; i < 1000; i++) {
   Dimension i = new Dimension();
}

 

In this case the creation of Dimension is a much cheaper operation then
creating a SimpleDateFormat and parsing a string (after the first
instance is created as that first instance causes AWT to be
inintialized), so it really doesn’t matter if it’s inside the loop or out
of it in terms of performance but if it were the same exact same
dimension object that was unchanged throughout the iterations, I’d still
move it outside of the loop because it is global to the loop and
unaffected by the current value of the loop counter.  Similarly with
a class like:

public class Format {
    public String formatDate(Date date) {
        SimpleDateFormat format = new SimpleDateFormat("YYYY-MM-DD");
        return format.format(date);
    }
}

I would move the SimpleDateFormat to at least a class variable and, if
every instance of the class used the same format for the date, a static
class variable.  I’d do that simply because it better reflects the
behaviour and intent of the class, not because it improves speed (if the
formatDate method were only called once, or was never called on the
critical path for performance, the change would not improve speed).

So out of this we come to a simple rule that all programmers should know
and follow pretty much automatically:

Scope variables to reflect their usage.  No more than required, no
less.

Forwards Or Backwards?

The original question posted was about loops and whether or not you
should evaluate the termination value (eg: array.length) before the loop
and store it in a local variable to avoid having to recalculate it.
 Now, obviously in Java array.length happens to be a variable anyway
so it’s unlikely to make any difference regardless.  The question
did lead to a bunch of different ways that people might try to optimize
arrays, including going from end to finish to take advantage of the fact
that CPUs can compare to 0 faster than they can compare to other numbers
(apparently).  It was pointed out many times, by many people that
it’s not worth worrying about it because of the small amount of
difference it would make, however one or two people thought that if you
knew in advance techniques to make things run faster, why not use them?
 A benchmark was put forward to determine what the fastest way to
iterate over an array was.  It showed the code:

static Test array_unopt = new Test() {
    public int[] array = new int[10000];
    public void test() {
        for (int i = 0; i < array.length; i += 1)
            continue;
    }
};
static Test array_harry = new Test() {
    public int[] array = new int[10000];
    public void test() {
        int l = array.length;
        for (int i = 0; i < l; i += 1)
            continue;
        }
};
static Test array_aj = new Test() {
    public int[] array = new int[10000];
    public void test() {
        for (int i = array.length - 1; i >= 0; i -= 1)
            continue;
    }
};

took 1570, 1439 and 1456 milliseconds for each test respectively.  I
thought the entire benchmark was href="http://lists.humbug.org.au/archives/general/2004-April/023785.html">
critically flawed.  Even if you take the benchmark as valid, it
showed no noticeable difference in the different ways to iterator over an
array.  Most developers however would expects to see the “for (int i
= 0; i < array.length; i++)” version and would thus recognise it more
easily.  Out of this I would derive the rule (but others didn’t):

Benchmarks are a waste of time.  Profile the real code instead.

 Memory Allocation In Java

The SimpleDateFormat in the for loop example sparked off a significant
debate about how expensive memory allocation was in Java, primarily
because it was heap based rather than stack based.  Obviously using
SimpleDateFormat as a test class is a really bad idea because of the
amount of string parsing it does in it’s constructor.  The other
benchmark that was put forward was creating a lot of Dimension objects in
a loop.  What really killed that benchmark is the fact that creating
a Dimension object requires AWT to be initialized, so when the benchmark
was run, it appeared that Java was exceptionally slow at creating a whole
bunch of objects that effectively did nothing.  The “equivalent”
benchmark in C (which is actually completely different but never mind
that for now):

#include <stdio.h>
#include <time.h>
#include <malloc.h>

int main()
{
  long t = time(0);
  int i;
  for (i = 0; i < 100000000; i += 1)
    free(malloc(16));
  printf("elapsed = %d\n", time(0) - t);
  return 0;
}

Turned out to run 30% slower than the Java benchmark (creating Dimension
objects).  That certainly surprised me.  So even taking into
account the fact that benchmarks are a horrible waste of time, I think
it’s safe to say that memory allocation in Java is a lot faster than even
I thought it was.

From this and a bunch of other suprises in the thread, I’d derive the
rule:

If you haven’t profiled it, you don’t know how well it performs.

Learning

Throughout the thread, there have been a lot of cases where someone
didn’t fully understand a part of the language or library that they
thought they knew well.  This lack of knowledge really stood out
when it came to talking about performance because the performance of an
application can be greatly affected by seemingly trivial details.
 All the people involved in the thread are really quite intelligent
people and most of them have significant experience in the computing
field, but nearly everyone (myself included) managed to come out with a
statement that was at least partially wrong.

From this I’d derive the rule:

You always need to learn more.

Summary

While I’ve learnt a lot of things out of that thread, as well as becoming
very frustrated at times, the main lessons I’ve either learnt from it or
had reaffirmed by it were:

Scope variables to reflect their usage.  No more than required, no
less.

Benchmarks are a waste of time.  Profile the real code instead.

If you haven’t profiled it, you don’t know how well it performs.

You always need to learn more.

Most importantly, many people pointed out the following, but I think Andrae
Muys said it best:

Keep the code simple.
Get the code correct.
Measure.

 

5 Responses to “Profiling and Optimization”

  1. Anon Says:

    Regarding the memory allocation benchmark: Java memory allocation in the young object heap (Which is where all your Dimension objects go) is done similar to stack frame allocation in C - by a pointer movement and no traversal of free heap lists. Thus, the free(malloc(16)) isn’t equivalent to new Dimension().


  2. Adrian Sutton Says:

    Definitely agree that the C benchmark is not equivalent to new Dimension() for a number of reasons, not the least of all that new Dimension goes much beyond allocating an object.

    I was however unaware that the young generation used a stack frame style of allocation. Do you have a reference for that assertion? It doesn’t seem to add up with the way the garbage collector works (described at http://java.sun.com/docs/hotspot/gc1.4.2/index.html#2.%20Generations|outline)


  3. Anon Says:

    http://www-106.ibm.com/developerworks/library/j-jtp01274.html:

    In HotSpot JVMs (Sun JDK 1.2 and later), things got a lot better — the Sun JDKs moved to a generational collector. Because a copying collector is used for the young generation, the free space in the heap is always contiguous so that allocation of a new object from the heap can be done through a simple pointer addition, as shown in Listing 1. This makes object allocation in Java applications significantly cheaper than it is in C, a possibility that many developers at first have difficulty imagining.


  4. Anon Says:

    I forgot to mention: I believe I read somewhere that the JVM does some neat escape analysis for objects as well, meaning that an object that is created in a thread, never leaves the thread, and becomes unreachable when it goes out of scope (or something like that), is just as efficient as a C++ stack-allocated object.

    Bottom line was that you shouldn’t optimize based on any weird notions of how you think the JVM works - the JIT will work best with “common coding practices”. Basically, the Java developers is throwing all their energy at making sure common code expressions go fast, and not wasting any time with esoteric constructs. So you may well shoot yourself in the foot with premature optimization.


  5. Adrian Sutton Says:

    Ah yes, thanks for that. I recall reading that article before now that I browse over it again and that quote is familiar.

    Absolutely agree about not trying to second guess how the JVM works.


Leave a Reply

Alternatively, subscribe to the Atom feed.