Living in a state of accord.

Myth busting mythbusted

As a follow up to the earlier link regarding the performance of animations in CSS vs JavaScript, Christian Heilmann – Myth busting mythbusted:
Jack is doing a great job arguing his point that CSS animations are not always better than JavaScript animations. The issue is that all this does is debunking a blanket statement that was flawed from the very beginning and distilled down to a sound bite. An argument like “CSS animations are better than JavaScript animations for performance” is not a technical argument. It is damage control. You need to know a lot to make a JavaScript animation perform well, and you can do a lot of damage. If you use a CSS animation the only source of error is the browser. Thus, you prevent a lot of people writing even more badly optimised code for the web.
Some good points that provide balance and perspective to the way web standards evolve and how to approach web development.

Myth Busting: CSS Animations vs. JavaScript


Myth Busting: CSS Animations vs. JavaScript:

As someone who’s fascinated (bordering on obsessed, actually) with animation and performance, I eagerly jumped on the CSS bandwagon. I didn’t get far, though, before I started uncovering a bunch of major problems that nobody was talking about. I was shocked.

This article is meant to raise awareness about some of the more significant shortcomings of CSS-based animation so that you can avoid the headaches I encountered, and make a more informed decision about when to use JS and when to use CSS for animation.

Some really good detail on performance of animations in browsers. I hadn’t heard of GSAP previously but it looks like a good option for doing animations, especially if you need something beyond simple transitions.

Chris Hates Writing • Small things add up

Chris Poole – Small things add up:
By migrating to the new domain, end users now save roughly 100 KB upstream per page load, which at 500 million pageviews per month adds up to 46 terabytes per month in savings for our users. I find this unreal.
Just by moving images to a domain that doesn’t have cookies. Impressive, especially given that users upload data significantly slower than they download it and HTTP 1.1 headers are sent uncompressed. Important to note however:
Are these optimizations the first place to start? No, certainly not—the reason we made them is we had exhausted almost every other page performance trick. But at the scale of large sites like 4chan (especially those run on a shoe-string budget!), it’s important to remember: little things do add up.
Very few sites are at the scale of 4chan, but I wonder how fast cookie traffic adds up for sites that use things like long poll.

Background Logging with the Disruptor

Peter Lawrey posted an example of using the Exchanger class from core Java to implement a background logging implementation. He briefly compared it to the LMAX disruptor and since someone requested it, I thought it might be interesting to show a similar implementation using the disruptor.

Firstly, let’s revisit the very high level differences between the exchanger and the disruptor. Peter notes:

This approach has similar principles to the Disruptor. No GC using recycled, pre-allocated buffers and lock free operations (The Exchanger not completely lock free and doesn't busy wait, but it could)

Two keys difference are:

  • there is only one producer/consumer in this case, the disruptor supports multiple consumers.
  • this approach re-uses a much smaller buffer efficiently. If you are using ByteBuffer (as I have in the past) an optimal size might be 32 KB. The disruptor library was designed to exploit large amounts of memory on the assumption it is relative cheap and can use medium sized (MBs) to very large buffers (GBs). e.g. it was design for servers with 144 GB. I am sure it works well on much smaller servers. ;)

Actually, there’s nothing about the Disruptor that requires large amounts of memory. If you know that your producers and consumers are going to keep pace with each other well and you don’t have a requirement to replay old events, you can use quite a small ring buffer with the Disruptor. There are a lot of advantages to having a large ring buffer, but it’s by no means a requirement.

It’s also worth noting that the Disruptor does not require consumers to busy-spin, you can choose to use a blocking wait strategy, or strategies that combine busy-spin and blocking to handle both spikes and lulls in event rates efficiently.

There is also an important advantage to the Disruptor that wasn’t mentioned: it will process events immediately if the consumer is keeping up. If the consumer falls behind however, it can process events in a batch to catch up. This significantly reduces latency while still handling spikes in load efficiently.

The Code

First let’s start with the LogEntry class. This is a simple value object that is used as our entries on the ring buffer and passed from the producer thread over to the consumer thread.

Peter’s Exchanger based implementation – the use of StringBuilder in the LogEntry class is actually a race condition and not thread safe. Both the publishing side and the consumer side are attempting to modify it and depending on how long it takes the publishing side to write the log message to the StringBuilder, it will potentially be processed and then reset by the consumer side before the publisher is complete. In this implementation I’m instead using a simple String to avoid that problem.

The one Disruptor-specific addition is that we create an EventFactory instance which the Disruptor uses to pre-populate the ring buffer entries.

Next, let’s look at the BackgroundLogger class that sets up the process and acts as the producer.

In the constructor we create an ExecutorService which the Disruptor will use to execute the consumer threads (a single thread in this case), then the disruptor itself. We pass in the LogEntry.FACTORY instance for it to use to create the entries and a size for the ring buffer.

The log method is our producer method. Note the use of two-phase commit. First claim a slot with the method, then copy our values into that slot’s entry and finally publish the slot, ready for the consumer to process. We could have also used the Disruptor.publish method which can make this simpler for many use cases by rolling the two phase commit into call.

The producer doesn’t need to do any batching as the Disruptor will do that automatically if the consumer is falling behind, though there are also APIs that allow batching the producer which can improve the performance if it fits into your design (here it’s probably better to publish each log entry as it comes in).

The stop method uses the new shutdown method on the Disruptor which takes care of waiting until all consumers have processed all available entries for you, though the code for doing it yourself is quite straight-forward. Finally we shut down the executor.

Note that we don’t need a flush method since the Disruptor is always consuming log events as quickly as the consumer can.

Last of all, the consumer which is almost entirely implementation logic:

The consumer’s onEvent method is called for each LogEntry put into the Disruptor. The endOfBatch flag can be used as a signal to flush written content to disk, allowing very large buffer sizes to be used causing writes to disk to be batched when the consumer is running behind, yet also ensure that our valuable log messages get to disk as quickly as possible.

The full code is available as a Gist.

JavaScript Performance: For vs ForEach

As part of exploring JavaScript performance, I started wondering if the particular method of looping made a difference to performance. After all, one of the most often repeated bits of advice is that JavaScript is much faster if you loop backwards through an array instead of forwards and the slow part of my code was a loop. Using I could set up a simple test case pretty quickly and easily: for vs forEach.

There are a bunch of variants on loops in that test and the results really surprised me:

Browserfor loop, cached lengthfor loop, cached length, no callbackfor loop, cached length, using function.callfor loop, reversefor loop, simpleforEach# Tests

Here’s that data in a graph for Chome and Firefox just to make it a bit easier to visualise:

Coloumn chart of Chrome test results

Coloumn chart of Firefox test results

Unsurprisingly, the performance characteristics vary quite noticeably between browsers. What was surprising to me at least was that the new [].forEach() is one of or the slowest method to iterate through an array – slower than implementing forEach yourself in JavaScript in most browsers (mobile Safari being the exception).

The other thing that stood out is that green bar in the middle, representing the ‘for loop, cached length using’ which is the current method of looping I’m using because that what the libraries implement. The advantage of this approach is that it not only supplies each item in the array, it also gives the index and the full array as extra arguments and it allows me to specify an object to use for ‘this’ in the callback function. That’s all potentially very handy but I wasn’t actually taking advantage of it. Looks like switching to a custom ‘each’ implementation which uses the for loop with cached length approach should give a performance boost without making any other changes to the code so no sacrifice in maintainability.

Sounds great, but after making the change there was less than a millisecond difference in the actual performance – well within the margin for error for measuring time in JavaScript performance tests.

Lessons Learnt

  • Use whatever for loop setup you want – it’s unlikely to affect overall performance in any noticeable way.
  • It’s probably not worth adding code to detect if there’s a built-in forEach you can leverage – just use the JavaScript implementation you’d need on older browsers anyway.

The Remaining Question

It’s all very well to learn these lessons, but since I’ve done all of this playing around in my own time at home, there’s a very significant problem with the results: they don’t include IE1. Since my real performance problems are worst on IE 6,7 and 8, I need to be running these tests there – it’s possible that changing the looping system really would yield performance gains in IE. I’ll have to try that out on Monday.

1 – I do have IE in a VM I’ve just been too lazy to fire it up. Of course, if you happen have IE ready to hand, you can go and run the tests yourself and will add that into the mix