Living in a state of accord.

Display Port Monitor Wakes Up a Few Seconds After Being Suspending on Linux

I have a dual monitor setup with Linux where one display is connected via DisplayPort and the other via DVI, both from an Nvidia graphics card. When the screen is locked both displays go black, the DVI monitor says it’s entering power save mode and the DisplayPort monitor says “No input” at which point both monitors turn back on displaying the unlock screen.

Playing with xset dpms force standby|suspend|off gave a variety of effects, sometimes the DisplayPort stayed off but the DVI turned back on, sometimes the DisplayPort went black but didn’t turn off etc.

Ultimately though the solution was to disable the “DP Auto Switch” and “HDMI Auto Switch” setting on the DisplayPort monitor. I imagine on other monitors this is equivalent to auto detecting the input source. Now the default Linux settings do the right thing and turn both monitors off after a while, they stay off until I move the mouse and then both turn back on perfectly.

End to End Tests @ LMAX Update

A little while back I said that LMAX ran around 11,000 end to end tests in around 50 minutes. Since then we’ve deployed some new hardware to run our continuous integration on, plus continued building new stuff and are now running about 11,500 tests in under 20 minutes.

A large part of the speed boost is extra VM instances but also the increased RAM allocation available to each VM has allowed us to increase a number of limits in the system and we can now run more tests concurrently against each VM instance.

We’re currently running 61 instances of the exchange using virtual machines hosted by four Dell FX2s chassis three-quarter populated with FC630s. That gives us 480 cores and 4.5TiB RAM. That’s certainly no small investment, but we consider it excellent value for money because of the boost in productivity and confidence it gives our development team (not to mention the boost in confidence and reliability it gives our clients).

Testing@LMAX – Aliases

Even after a magnum opus on the DSL LMAX uses for acceptance tests, there’s one crucial feature that I haven’t mentioned: the use of aliases to allow tests to use simple, meaningful names while ensuring that uniqueness constraints are met.

Creating a user with our DSL looks like:


You might expect this to create a user with the username ‘user’, but then we’d get conflicts between every test that wanted to call their user ‘user’ which would prevent tests from running safely against the same deployment of the exchange.

Instead, ‘user’ is just an alias that is only meaningful while this one test is running. The DSL creates a unique username that it uses when talking to the actual system. Typically this is done by adding a postfix so the real username is still reasonably understandable e.g. user-fhoai42lfkf.

We do the same thing for instruments, venues, currencies and anything else that needs unique names.

This relatively simple trick gives us a great deal of isolation between tests that may run against the same server instance, even allowing us to run the same test multiple times without interfering with itself.

Testing@LMAX – Abstraction by DSL

At LMAX, all our acceptance tests are written using a high level DSL. This gives us two key advantages:

  • Tests can focus on what they’re testing with the details and mechanics hidden behind the DSL
  • When things change we can update the DSL implementation to match, avoiding the need to change each test that happens to touch on the area.

The DSL that LMAX uses is probably not what most people think of when hearing the term DSL – it doesn’t attempt to read like plain English, just simplifies things down significantly. We’ve actually open sourced the simple little library that is the entrance-way to what we think of as the DSL – creatively named simple-dsl. It’s essentially the glue between what we write in an acceptance test and the plain-java implementation behind that.

As a simple example here’s a test that creates a user, instrument and places an order:

tradingAPI.placeOrder("instrument", "quantity: 5", "type: market",
"expectedStatus: UNMATCHED");

Overall, the aim is to have the acceptance tests written at a very high level – focussing on what should happen but leaving the how to the DSL implementation. The tradingAPI.placeOrder call is a good example of this, it’s testing that when the user places an order on an instrument with no liquidity, it won’t be matched. In the DSL that’s actually a two step process, first place the order and receive a synchronous OK response to say the order was received, then when the order reaches the matching engine an asynchronous event will be emitted to say the order was not matched. We could have made that two separate calls in the acceptance test but that would have exposed too much detail about how the system works when what we really care about is that the order is unfilled, how that’s reported is an implementation detail.

However that does mean that the implementation of the DSL an important part of the specification of the system. The acceptance tests express the user requirements and the DSL expresses the technical details of those requirements.

Model the Types of Users and Interactions

All our acceptance tests extends a base class, DslTestCase, that exposes a number of public variables that act as the entry points to the system (registrationAPI, adminAPI and tradingAPI in the example above). Each of these roughly represent a way that certain types of users interact with the system. So registrationAPI works with the API exposed by our registration gateway – the same APIs that our sign-up process on the website talks to.  adminAPI uses the same APIs our admin console talks to and tradingAPI is the API that both our UI uses and that many of our clients interact with directly.

We also have UI variants like adminUI and tradingUI that use selenium to open a browser and test the UI as well.

Our system tends to have a strong correlation between the type of user and the entry point to the system they use so our DSL mostly maps to the gateways into our system, but in other systems it may be more appropriate to focus more on the type of user regardless of what their entry point into the system is. Again the focus should be on what happens more than how. The way you categorise functions in the DSL should aid you in thinking that way.

That said, our top level DSL concepts aren’t entirely restricted to just the system entry point they model. For example the registrationAPI.createUser call in the example will initially talk to the system’s registration API, but since a new account isn’t very useful until it deposits funds, it then talks to the admin console to approve the registration and credit some funds into the users account. There’s a large dose of pragmatism involved in the DSL implementation with the goal being to make it easy to read and write the acceptance tests themselves and we’re willing to sacrifice a little design purity to get that (but only a little).

Top level concepts often further categorise the functionality they provide, for example our admin console that adminAPI drives has a lot of functionality and is used by a range of types of users, so it sub-categorises into things like marketOperations, customerServices, risk, etc.

Add Reusable Components to the DSL

One of the signs that people don’t understand the design of our DSL is when they extract repetitive pieces of tests into a private method within the test itself. On the surface this seems  like a reasonable idea, allowing that sequence of actions to be reused by multiple tests in the file. If the sequence is useful in many test cases within one file and significant enough to be worth the indirection of extracting a method it’s almost inevitably useful across many files.

Instead of extracting a private method, put reusable pieces into the DSL itself. Then they’ll be available to all your tests.  More importantly though, you can make that method fit into the DSL style properly – in our case, using simple-dsl to pass parameters instead of a fixed set of method parameters. 

One of our top level concepts in the DSL is ‘workflows’. It bundles together broader sequences of actions that cut across the boundaries of any one entrance point. It’s a handy home for many of the reusable functions we split out. The down side is it’s currently a real grab bag of random stuff and could do with some meaningful sub-categorisation. Naming is hard…

Design to Avoid Intermittency

The way the DSL is designed is a key weapon in the fight against intermittency. The first rule is to design each function to appear synchronous as much as possible. The LMAX Exchange is a highly asynchronous system design but our DSL hides that as much as possible.

The most useful pattern for this is that whenever you provide a setter-type function it should automatically wait and verify that the effect has been fully applied by checking the equivalent getter-type API. So the end of the DSL implementation for registrationAPI.createUser is a waiter that polls our broker service waiting for the account to actually show up there with the initial balance we credited. That way the test can carry on and place an order immediately without intermittently being rejected for a lack of funds.

The second key pattern applies when verifying values. We produce a lot of reports as CSV files so originally had DSL like:"date: today", "rememberAs: myCsvFile");"csvFile: myCsvFile", "amount: 10.00");

Apart from being pretty horrible to read, this leads to a lot of intermittency because our system doesn’t guarantee that cash flows will be recorded to the database immediately, it’s done asynchronously so is only guaranteed to happen within a reasonable time. Instead it’s much better to write:"date: today", "amount: 10.00");

Then inside the DSL you can use a waiter to poll the cashflow CSV until it does contain the expected value or whatever you define as a reasonable time elapses and the test times out and fails. Again, having the test focus on what and the DSL dealing with how allows us to write better tests.

Don’t Get Too Fancy with the DSL

The first thought most people have when they see our DSL is that it could be so much better if we used static types and chained method calls to get the compiler to validate more stuff and have refactoring tools work well. It sounds like a great idea, our simple string based DSL seems far too primitive to work in practice but we’ve actually tried it the other way as well and it’s not as great as it sounds.

Inevitably when you try to make the DSL too much like English or try to get the compiler more involved you add quite a lot of complexity to the DSL implementation which makes it a lot harder to maintain so the cost of your acceptance tests goes up – exactly the opposite of what you were intending.

The trade offs will vary considerably depending on which language you’re using for your tests and the best style of DSL to create will vary significantly. I strongly suspect though that regardless of language the best DSL is a radically simple one, just that different things are radically simple in different languages.

DSL’s Matter

This was meant to be a quick article before getting on to what I really wanted to talk about but suddenly I’m 1500 words in and still haven’t discussed anything about the implementation side of the DSL.

It turns out that while our DSL might be simple and something we take for granted, it’s a huge part of what makes our acceptance tests easily maintainable instead of gradually becoming a huge time sink that prevents any change to the system. My intuition is that those people who have tried acceptance tests and found them too expensive to maintain have failed to find the right style of abstraction in the DSL they use, leaving their tests too focused on how instead of what.

Making End-to-End Tests Work

The Google Testing Blog has an article “Just Say No to More End-to-End Tests” which winds up being a rather depressing evaluation of the testing capabilities, culture and infrastructure at Google. For example:

Let’s assume the team already has some fantastic test infrastructure in place. Every night:

  1. The latest version of the service is built. 
  2. This version is then deployed to the team’s testing environment. 
  3. All end-to-end tests then run against this testing environment. 
  4. An email report summarizing the test results is sent to the team.

If your idea of fantastic test infrastructure starts with the words “every night” and ends with an email being sent you’re doomed. Bryan Pendleton does a good job of analysing and correcting the details so I won’t recover that ground. Instead, let me provide a view of what reasonable test infrastructure looks like.

At LMAX we’ve recently reached the milestone of 10,000 end to end acceptance tests. We’ve obviously invested a lot of time in building up all those tests but they’re invaluable in the way they free us to try daring things and make sweeping changes, confident that if anything is broken it will be caught. We’re happy to radically restructure components in ways that require lots of changes to unit tests because of those end-to-end tests.

We also have huge numbers of unit tests, integration tests, performance tests, static analysis and various other forms of tests, but the end-to-end tests are far more than a sanity-check, they’re a primary form of quality control.

Those end-to-end tests, or acceptance tests as we call them:

  • run constantly through the day
  • complete in around 50 minutes, including deploying and starting the servers, running all the tests and shutting down again at the end
  • are all required to pass before we consider a version releasable
  • are included in our information radiators to ensure the team has constant visibility into the test results
  • are owned by the whole team – testers, developers and business analysts together

That’s pretty much entry-level for doing end-to-end testing (or frankly any testing). We’ve also got a few extra niceties that I’ve written about before:

Plus the test results are displayed in real-time, so we don’t even have to wait for the end of the test run to see any failures. Tests that failed on the previous run are run first to give us quick feedback on whether they’ve been fixed or not.

There’s lots of great stuff in there, but we have more work to do. We have an intermittency problem. When we started out we didn’t believe that intermittency could be avoided and accepted a certain level of breakage on each build – much like the Google post talks about expecting 90% of tests to pass. That attitude is a death-knell for test reliability. If you don’t have all the tests passing consistently, gradually and inevitably more and more tests become intermittent over time.

We’ve been fighting back hard against intermittency and making excellent progress – we’ve recently added the requirement that releases have no failures and green builds are the norm and if there are intermittent failures it’s usually only one or two per run. Currently we’re seeing an intermittent failure rate of around 0.00006% of tests run (which actually sounds pretty good but with 10,000 tests that’s far too many runs with failures that should have been green).

But improvements come in waves with new intermittency creeping in because it can hide in amongst the existing noise. It has taken and will take a lot of dedication and commitment to dig ourselves out of the intermittency hole we’re in but it’s absolutely possible and we will get there.

So next time you hear someone try to tell you that end-to-end tests aren’t worth the effort, point them to LMAX. We do end-to-end testing big time and it is massively, indisputably worth it. And we only expect it to become more worth it as we reduce the intermittency and continue improving our tools over time.