Symphonious

Living in a state of accord.

Background Logging with the Disruptor

Peter Lawrey posted an example of using the Exchanger class from core Java to implement a background logging implementation. He briefly compared it to the LMAX disruptor and since someone requested it, I thought it might be interesting to show a similar implementation using the disruptor.

Firstly, let’s revisit the very high level differences between the exchanger and the disruptor. Peter notes:

This approach has similar principles to the Disruptor. No GC using recycled, pre-allocated buffers and lock free operations (The Exchanger not completely lock free and doesn't busy wait, but it could)

Two keys difference are:

  • there is only one producer/consumer in this case, the disruptor supports multiple consumers.
  • this approach re-uses a much smaller buffer efficiently. If you are using ByteBuffer (as I have in the past) an optimal size might be 32 KB. The disruptor library was designed to exploit large amounts of memory on the assumption it is relative cheap and can use medium sized (MBs) to very large buffers (GBs). e.g. it was design for servers with 144 GB. I am sure it works well on much smaller servers. ;)

Actually, there’s nothing about the Disruptor that requires large amounts of memory. If you know that your producers and consumers are going to keep pace with each other well and you don’t have a requirement to replay old events, you can use quite a small ring buffer with the Disruptor. There are a lot of advantages to having a large ring buffer, but it’s by no means a requirement.

It’s also worth noting that the Disruptor does not require consumers to busy-spin, you can choose to use a blocking wait strategy, or strategies that combine busy-spin and blocking to handle both spikes and lulls in event rates efficiently.

There is also an important advantage to the Disruptor that wasn’t mentioned: it will process events immediately if the consumer is keeping up. If the consumer falls behind however, it can process events in a batch to catch up. This significantly reduces latency while still handling spikes in load efficiently.

The Code

First let’s start with the LogEntry class. This is a simple value object that is used as our entries on the ring buffer and passed from the producer thread over to the consumer thread.

Peter’s Exchanger based implementation – the use of StringBuilder in the LogEntry class is actually a race condition and not thread safe. Both the publishing side and the consumer side are attempting to modify it and depending on how long it takes the publishing side to write the log message to the StringBuilder, it will potentially be processed and then reset by the consumer side before the publisher is complete. In this implementation I’m instead using a simple String to avoid that problem.

import com.lmax.disruptor.EventFactory;

class LogEntry
{
    public static final EventFactory<LogEntry> FACTORY = new EventFactory<LogEntry>()
    {
        public LogEntry newInstance()
        {
            return new LogEntry();
        }
    };

    long time;
    int level;
    String text;
}

The one Disruptor-specific addition is that we create an EventFactory instance which the Disruptor uses to pre-populate the ring buffer entries.

Next, let’s look at the BackgroundLogger class that sets up the process and acts as the producer.

import com.lmax.disruptor.RingBuffer;
import com.lmax.disruptor.dsl.Disruptor;

import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

public class BackgroundLogger
{
    private static final int ENTRIES = 64;

    private final ExecutorService executorService;
    private final Disruptor<LogEntry> disruptor;
    private final RingBuffer<LogEntry> ringBuffer;

    BackgroundLogger()
    {
        executorService = Executors.newCachedThreadPool();
        disruptor = new Disruptor<LogEntry>(LogEntry.FACTORY, ENTRIES, executorService);
        disruptor.handleEventsWith(new LogEntryHandler());
        disruptor.start();
        ringBuffer = disruptor.getRingBuffer();
    }

    public void log(String text)
    {
        final long sequence = ringBuffer.next();
        final LogEntry logEntry = ringBuffer.get(sequence);

        logEntry.time = System.currentTimeMillis();
        logEntry.level = level;
        logEntry.text = text;

        ringBuffer.publish(sequence);
    }

    public void stop()
    {
        disruptor.shutdown();
        executorService.shutdownNow();
    }
}



In the constructor we create an ExecutorService which the Disruptor will use to execute the consumer threads (a single thread in this case), then the disruptor itself. We pass in the LogEntry.FACTORY instance for it to use to create the entries and a size for the ring buffer.

The log method is our producer method. Note the use of two-phase commit. First claim a slot with the ringBuffer.next() method, then copy our values into that slot’s entry and finally publish the slot, ready for the consumer to process. We could have also used the Disruptor.publish method which can make this simpler for many use cases by rolling the two phase commit into call.

The producer doesn’t need to do any batching as the Disruptor will do that automatically if the consumer is falling behind, though there are also APIs that allow batching the producer which can improve the performance if it fits into your design (here it’s probably better to publish each log entry as it comes in).

The stop method uses the new shutdown method on the Disruptor which takes care of waiting until all consumers have processed all available entries for you, though the code for doing it yourself is quite straight-forward. Finally we shut down the executor.

Note that we don’t need a flush method since the Disruptor is always consuming log events as quickly as the consumer can.

Last of all, the consumer which is almost entirely implementation logic:


import com.lmax.disruptor.EventHandler;

public class LogEntryHandler implements EventHandler<LogEntry>
{

    public LogEntryHandler()
    {
    }

    public void onEvent(final LogEntry logEntry, final long sequence, final boolean endOfBatch) throws Exception
    {
        // Write
    }

}

The consumer’s onEvent method is called for each LogEntry put into the Disruptor. The endOfBatch flag can be used as a signal to flush written content to disk, allowing very large buffer sizes to be used causing writes to disk to be batched when the consumer is running behind, yet also ensure that our valuable log messages get to disk as quickly as possible.

The full code is available as a Gist.

OS X Lion: iCal Repeatedly Asks for Google Calendar Password

The one problem I’ve found when upgrading to Lion is that suddenly iCal couldn’t sync to my Google Apps Calendar account – instead it repeatedly asked for the password. I’m still not really sure what caused this, but my solution was to simply delete both ~/Library/Preferences/*iCal* and ~/Library/Calendars. You really only want to do that if you exclusively use Google Calendar. If you have local calendars deleting ~/Library/Calendars will delete them.

Once I’d done that, I re-enabled the Calendars option for the account in System Preferences, “Mail, Contacts and Calendars”, entered my password one last time and it worked perfectly.

Martin Fowler on the LMAX Architecture

Martin Fowler has posted an excellent overview of the LMAX architecture which helps put the use-cases and key considerations that led to the LMAX disruptor pattern in context.

Good to see some focus being put on the programming model that LMAX uses as well. While convention suggests that to get the best performance you have to make heavy use of concurrency and multithreading, the measurements LMAX did showed that the cost of contention and communication between threads was too great. As such, the programming model at LMAX is largely single-threaded, achieving the best possible performance while avoiding the complexities of multithreaded programming.

LMAX Disruptor – High Performance, Low Latency and Simple Too

The LMAX disruptor is an ultra-high performance, low-latency message exchange between threads. It's a bit like a queue on steroids (but quite a lot of steroids) and is one of the key innovations used to make the LMAX exchange run so fast. There is a rapidly growing set of information about what the disruptor is, why it's important and how it works – a good place to start is the list of articles and for the on-going stuff, follow LMAX Blogs. For really detailed stuff, there's also the white paper (PDF).

While the disruptor pattern is ultimately very simple to work with, setting up multiple consumers with the dependencies between them can require a bit too much boilerplate code for my liking. To make it quick and easy for 99% of cases, I've whipped up a simple DSL for the disruptor pattern. For example, to wire up a "diamond pattern" of consumers:

Diamond pattern of consumers. C3 depends on both C1 and C2 completing.

(Image blatantly stolen from Trisha Gee's excellent series explaining the disruptor pattern)

In this scenario, consumers C1 and C2 can process entries as soon as the producer (P1) puts them on the ring buffer (in parallel). However, consumer C3 has to wait for both C1 and C2 to complete before it processes the entries. In real life this might be because we need to both journal the data to disk (C1) and validate the data (C2) before we do the actual business logic (C3).

With the raw disruptor syntax, these consumers would be created with the following code:

Executor executor = Executors.newCachedThreadPool();
BatchHandler handler1 = new MyBatchHandler1();
BatchHandler handler2 = new MyBatchHandler2();
BatchHandler handler3 = new MyBatchHandler3()
RingBuffer ringBuffer = new RingBuffer(ENTRY_FACTORY, RING_BUFFER_SIZE);
ConsumerBarrier consumerBarrier1 = ringBuffer.createConsumerBarrier();
BatchConsumer consumer1 = new BatchConsumer(consumerBarrier1, handler1);
BatchConsumer consumer2 = new BatchConsumer(consumerBarrier1, handler2);
ConsumerBarrier consumerBarrier2 =
ringBuffer.createConsumerBarrier(consumer1, consumer2);
BatchConsumer consumer3 = new BatchConsumer(consumerBarrier2, handler3);
executor.execute(consumer1);
executor.execute(consumer2);
executor.execute(consumer3);
ProducerBarrier producerBarrier =
ringBuffer.createProducerBarrier(consumer3);

We have to create our actual handlers (the two instances of MyBatchHandler), plus consumer barriers, BatchConsumer instances and actually execute the consumers on their own threads. The DSL can handle pretty much all of that setup work for us with the end result being:

Executor executor = Executors.newCachedThreadPool();
BatchHandler handler1 = new MyBatchHandler1();
BatchHandler handler2 = new MyBatchHandler2();
BatchHandler handler3 = new MyBatchHandler3();
DisruptorWizard dw = new DisruptorWizard(ENTRY_FACTORY, RING_BUFFER_SIZE, executor);
dw.consumeWith(handler1, handler2).then(handler3);
ProducerBarrier producerBarrier = dw.createProducerBarrier();

We can even build parallel chains of consumers in a diamond pattern: Two chains of consumers running in parallel with a final consumer dependent on both.

(Thanks to Trish for using her fancy graphics tablet to create a decent version of this image instead of my original finger painting on an iPad…)

dw.consumeWith(handler1a, handler2a);
dw.after(handler1a).consumeWith(handler1b);
dw.after(handler2a).consumeWith(handler2b);
dw.after(handler1b, handler2b).consumeWith(handler3);
ProducerBarrier producerBarrier = dw.createProducerBarrier();

The DSL is quite new so any feedback on it would be greatly appreciated and of course feel free to fork it on GitHub and improve it.

The Single Implementation Fallacy

As my colleague and favorite debating opponent Danny Yates noted:

We got into a bit of a debate at work recently. It went a bit like this:

“Gah! Why do we have this interface when there is only a single implementation?”

(The stock answer to this goes:) “Because we need the interface in order to mock this class in our tests.”

“Oh no you don’t, you can use the FingleWidget [insert appropriate technology of your mocking framework of choice here - e.g. JMock ClassImposteriser]! I’m smarter than you!”

“Well, yes, you can. But if you’ve correctly followed Design for Extension principles, you’ve made the class final, right? And you definitely can’t mock that! Hah! I’m smarter than you!”

“Ah ha! But you could always use the JDave Unfinaliser Agent! I’m so smart it hurts!”

I tend to side with Danny that using the unfinaliser agent is a bad idea, but I also have to question the benefit of declaring a class final in the first place. However, let’s first cover why I think single implementation interfaces are an “enterprisey” anti-pattern in a little more detail.

Why Single Implementation Interfaces Are Evil

Interface Separation or Interface Duplication

The main argument people raise in favour of having interfaces for everything, even if there’s only one implementation is that it separates the API from the implementation. However in practice with languages like Java, this is simply not true. The interface has to be entirely duplicated in the implementation and the two are tightly coupled. Take the code:

public interface A {
  String doSomething(int p1, Object p2);
}
public class AImpl implements A {
  public String doSomething(int p1, Object p2) { ... }
}

This is a pretty clear violation of Don’t Repeat Yourself (DRY). The fact that the implementation name is essentially the same as the interface is a clear indication that there’s actually only one concept here. If there had been a vision of multiple implementations that work in different ways the class name would have reflected this (e.g. LinkedList vs ArrayList or FileReader vs StringReader).

As a general rule, if you can’t think of a good name for your class (or method, variable, etc) you’ve probably broken things down in the wrong way and you should rethink it.

Extra Layers == Extra Work

The net result of duplicating the API is that each time you want to add or change a method on the interface you have to duplicate that work and add it to the class as well. It’s a small amount of time but distracts from the real task at hand and amounts to a lot of unnecessary “busy work” if you force every class to have a duplicate interface. Plus if you subscribe to the idea of code as inventory, those duplicated method declarations are costing you money.

Also, as James Turner pointed out:

Unneeded interfaces are not only wasted code, they make reading and debugging the code much more difficult, because they break the link between the call and the implementation.

This is probably the biggest problem I have with single implementation interfaces. When you’re tracking down a difficult bug you have to load up a lot of stuff into your head all at once – the call stack, variable values, expected control flow vs actual etc etc. Having to make the extra jump through a pointless interface on each call can be the straw that breaks the camel’s back and cause the developer to loose track of the vital context information. It’s doubly bad if you have to jump through a factory as well.

Library Code

Many people argue that in library code, providing separate interfaces is essential to define the API and ensure nothing leaks out accidentally. This is the one case where I think it makes sense to use an interface as it frees up your internal classes to use method visibility to let classes collaborate “behind the scenes” and have a clean implementation, without that leaking out to the API.

A word of warning however: one of the fatal mistakes you can make in a Java library is to provide interfaces that you expect the code using the library to implement. Doing this makes it extremely difficult to maintain backwards compatibility – if you ever need to add a method to that interface compatibility is immediately and unavoidably broken. On the other hand, providing an abstract class that you expect to be extended allows new methods to be added more easily since they can often provide a no-op implementation and maintain backwards compatibility. Abstract classes do limit the choices the code using the library can make though so neither option is a clear cut winner.

Why Declaring Classes Final is Pointless

So at last we come back around to the original problem of needing to mock out classes during testing but being unable to because they’re marked final. There seem to be two main reasons that people like to make classes final:

  1. Classes should be marked final unless they are explicitly designed for inheritance.
  2. Marking a class final provides hints to HotSpot that can improve performance either by method inlining or using faster method call dispatch algorithms (direct instead of dynamic).

Designing for Extension

I have a fair bit of sympathy for the argument that classes should be final unless designed for inheritance, but for shared code within a development team it has a very critical flaw – it’s trivial to just remove the word final and carry on, so people will. Let’s face it, if you look at a class and think “I can best solve my problem by extending this class” then a silly little keyword which may have just been put their by habit is not going to stop you. You’d need to also provide a clear comment about why the class isn’t suitable for extension but in most cases such a reason doesn’t exist – extension just hadn’t been thought about yet so the class is inherently not designed for extension. Besides which, if you have the concept of shared code ownership then whoever extends the class is responsible for making any design changes required to make it suitable for extending when they use it as a base class. Most likely though, they have already looked at the class and decided it’s suitable for extension as is which is why they are trying to do just that.

Perhaps what would be better is to require any class that is designed for extension to have a @DesignedForExtension annotation, then use code analysis tools (like Freud) to fail the build if a class without that annotation is extended. That makes the default not-extendable which is more likely to be correct and still lets you mock the object for testing. You would however want an IDE plugin to make the error message show up immediately but it does seem like a nice way to get the best of all worlds.

Final Classes Go Faster

I found myself immediately suspicious of this claim – it may have been true once but HotSpot is a seriously clever little JIT engine and advances at an amazing pace. Many people claim that HotSpot can inline final methods and it can, but it can also inline non-final methods. That’s right, it will automatically work out that there is only one possible version of this method that exists and go right ahead and inline it for you.

There is also a slight variant of this that claims that since dynamic method dispatch is more expensive, marking a method as final means the JVM can avoid the dynamic dispatch for that method. Marking a class final effectively makes all it’s methods final so that every method would get the benefit.

My reasoning is such that if HotSpot can work out that it can safely inline a method, it clearly has all the information required to avoid the dynamic dispatch as well. I can’t however find any reference to definitively show it does that. Fortunately, I don’t need to. Remember back at the start we said we had to introduce an interface to make things testable? That means changing our code from:

final class A { public void doSomething(); }
…
A a = new A();
a.doSomething();

To:

interface A { void doSomething(); }
final class AImpl implements A { public void doSomething(); }
…
A a = new AImpl();
a.doSomething();

Since we’ve duplicated the method declaration, there is no guarantee that the only version of doSomething is in AImpl, since any class could implement interface A and provide a version of doSomething. We’re right back to relying on HotSpot doing clever tricks to enable method inlining and avoiding dynamic method dispatch.

There simply can be no performance benefit to declaring a class final if you then refer to it via an interface rather than the concrete class. And if you refer to it as the concrete class you can’t test it.

Conclusion

There shouldn’t be anything too surprising here – less code is better, simpler architectures work better and never underestimate how clever HotSpot is. Slavishly following the rule of separating the interface from the code doesn’t make code any more testable, it doesn’t reduce coupling between classes (since they still call the same methods) and it does create extra work. So why does everyone keep doing it?

Oh and, nyah, nyah, I’m so smart it hurts…