Symphonious

Living in a state of accord.

Testing@LMAX – Compatibility Tests

Once an application goes live, it is absolutely essential that any future changes are able to be work with the existing data in production, typically by migrating it as changes are required. That existing data and the migrations applied to it are often the riskiest and least tested functions in the system. Mistakes in a migration will at best cause a multi-hour outage while backups are restored and more likely will subtly corrupt data, producing incorrect results that may go unnoticed for long periods making it impossible to roll back. At LMAX we reduce that risk by running compatibility tests.

Sanitised Data

The first requirement for testing data migrations is testing data that as production-like as possible. It’s almost never a good idea to bring actual production data into development and it’s definitely never going to happen for a finance company like LMAX. Instead, we have a sanitiser that runs in production and generates a very heavily sanitised version of production data that still has the same “shape”. For example, it has the same number of accounts, the same distribution of open positions across instruments, null values in the same places there’s null values in production and so on. However it replaces anything that’s even vaguely personally identifiable or sensitive information. If in doubt, sanitise it out.

Despite being heavily sanitised, we still treat it as data that should be secured but it can be brought into our development, testing and staging environments.

We have multiple production environments so we have sanitised data that matches the shape of each of those environments.

Can Migrations Run?

Once we have sanitised data the most basic check is to confirm the migration process will actually complete successfully. This ensures we can release successfully but doesn’t give us any real confidence that the migration is successful. For such a primitive check it’s surprisingly effective as it picks up the common errors of changing columns to NOT NULL when the production data actually does have null values, or adding a unique constraint to tables when the content isn’t actually unique.

Did Migrations Work?

The obvious next step is to write tests to confirm that migrations actually worked. Our compatibility test jobs are setup so that we can easily write a JUnit test that will be run after migrations complete so we can verify the state. 

The most direct form of test is an example based test. We select some examples from the production dataset and write assertions to check that specific bit of the data migrated in the way we expect. The down side is that we’re dealing with live production data which is regularly updated so it’s possible that our examples will change after we’ve written the test and then, correctly, migrate to something different to what we expect. Still, these are often useful to put through for a single run as a sanity test when developing the migration, then delete them.

Slightly more generic, we can write a test that assert constraints that must be true immediately after the migration completes. For example, when we made permissions more fine-grained we needed to assign a new type of payment role to every account that used real money, but not to any demo accounts (which use pretend money). We can write a test to verify that migration worked correctly quite easily, however once the migration goes out admin users may add or remove the role to different accounts and the constraint would no longer hold. For cases like that we simply delete the test once the migration has gone live at which point it’s done its job anyway.  We also mark the test with a @ValidUntil annotation that makes it clear that the test has a limited life time in case we forget to delete it.

Finally, we can often identify constraints that should always be valid and write permanent tests for them. These are extremely powerful, testing not just that our current migration works correctly but that no future migration breaks that expectation.

Did Something Unexpected Happen?

The compatibility tests that should always hold true have an additional benefit – they give us early feedback that the production data has diverged from expectations for some reason, typically because a bug slipped through. We can then investigate, fix the bug to prevent any more data issues and work out how to clean up the problem.

Obviously, finding issues only after production data has gone wrong is not something we ever want to do but it’s still a useful safety net if something slips through all our pre-release testing and the database schema constraints we use. Typically when we do find issues they are minor inconsistencies that don’t cause any issues now, but are like little time bombs just waiting for a future release to assume it can’t possibly happen. So even getting feedback that late in the process often allows us to avoid any user-noticeable effects.

Making Them Easy

We have a base class we extend our compatibility tests from which makes it easy to get a connection to the database and has a few handy utilities for asserting things. By far the most useful however is the assertNoRowsMatch method. It does exactly what it says – takes an SQL query and asserts that no rows match. If any do, it prints them out so you get really useful debug information to start investigating the problem. For example:

@Test
public void shouldHaveAPrincipalForEveryAccount() {
  assertNoRowsMatch(
    "SELECT a.account_id, a.name " +
    "FROM account a " +
    "LEFT JOIN principal p ON a.principal_id = p.principal_id " +
    "WHERE p.principal_id IS NULL");
}

If we’ve somehow wound up with an account with no principal that could log into it the test will print the account ID and name so we can investigate what happened and clean up the data.

 

Testing@LMAX – Making Test Output Useful

Just like production code, you should assume things are going to go wrong in your tests and when it does you want good logging to help track down what happened and why. So just like production code, you should use a logging framework within your DSL, use meaningful log levels and think about what info you’d need in the logs if something went wrong (and what you don’t). There’s also a few things we’ve found very useful specifically in our test logging.

Log Alias to Real Name Mappings

Since the DSL uses aliases, if we want to poke around in the exchange manually to understand why a test failed, we need to know the real names to use. So whenever we create a real name for an alias we log some information about it. For example when creating an instrument:

21:28:53,916 WARN  …rdersWithSuppliedInstructionIds [AdminAPI] Created instrument 'instrument' (actual name: 'instrument-54369ih64k63', externalId: 180423, internalId: 180422) on tradeReportingGroup 1003

All the key information we need is in that log statement – the alias, real name plus unique identifiers (externalId and internalId).

Name Your Threads

We use a custom JUnit test runner (via the @RunWith annotation) so we can run tests within a test suite in parallel. With tests running in parallel all their output gets mixed up and becomes hard to read.  Recently we started setting the test thread names to the name of the test case. 

private void executeTest(
  final FrameworkMethod method,
  final Description description) {
    Thread.currentThread().setName(
      getThreadName(description.getMethodName()));
    methodBlock(method).evaluate();
}

We actually trim the method name to 30 characters (cutting off the start rather than the end which tends to work better with the way we name tests) so we get output like:

21:28:47,356 INFO  …tityAndPriceAndNoStopLossOffset [TestContext] Created PartyCode XMCS (alias: marketMaker)

21:28:47,356 INFO  …erWithZeroSuppliedInstructionId [TestContext] Created PartyCode U6HK (alias: marketMaker)

21:28:47,356 INFO  …ctionIdIfFirstOrderHasCompleted [TestContext] Created PartyCode 3DS6 (alias: marketMaker)

21:28:47,356 INFO  …StopLossOffsetAndAStopLossPrice [TestContext] Created PartyCode XFY8 (alias: marketMaker)

There are a few cases where we spawn additional threads within a test (typically to pull data from long poll or other push event channels). In those cases we generally pass the thread name down with an additional prefix (e.g. LongPoll-…StopLossOffsetAndAStopLossPrice) so we can still associate that output with the right test.

Time Traveller Names

The way we allow time travelling tests to run in parallel is reasonably complex – only one thread ever actually executes the time travel and there’s a bunch of cross-thread coordination – so our thread names aren’t as useful in that little area of code. As such we give each test we’re currently running a time traveller name so we get log output from the Tardis like:

The Doctor asking to travel to mondayOpen (Mon Mar 14 07:00:00 UTC 2016)
Clara asking to travel to mondayOpen (Mon Mar 14 07:00:00 UTC 2016)
Captain Jack asking to travel to mondayOpen (Mon Mar 14 07:00:00 UTC 2016)
Rory asking to travel to mondayOpen (Mon Mar 14 07:00:00 UTC 2016)
Missy asking to travel to mondayOpen (Mon Mar 14 07:00:00 UTC 2016)
Amy asking to travel to mondayOpen (Mon Mar 14 07:00:00 UTC 2016)
River asking to travel to mondayOpen (Mon Mar 14 07:00:00 UTC 2016)
Rose asking to travel to mondayOpen (Mon Mar 14 07:00:00 UTC 2016)
Time travelling to: mondayOpen (Mon Mar 14 07:00:00 UTC 2016)

We may have gotten a little carried away with the Doctor Who theme but having a name for each time traveller makes it far easier to understand what each test is waiting for.

Testing@LMAX – Introducing ElementSpecification

Today LMAX Exchange has released ElementSpecification, a very small library we built to make working with selectors in selenium/WebDriver tests easier. It has three main aims:

  • Make it easier to understand selectors by using a very English-like syntax
  • Avoid common pitfalls when writing selectors that lead to either brittle or intermittent tests
  • Strongly discourage writing overly complicated selectors.

Essentially, we use ElementSpecification anywhere that we would have written CSS or XPath selectors by hand. ElementSpecification will automatically select the most appropriate format to use – choosing between a simple ID selector, CSS or XPath.

Making selectors easier to understand doesn’t mean making locators shorter – CSS is already a very terse language. We actually want to use more characters to express our intent so that future developers can read the specification without having to decode CSS. For example, the CSS:

#data-table tr[data-id='78'] .name

becomes:

anElementWithId("data-table")
.thatContainsA("tr").withAttributeValue("data-id", "78")
.thatContainsAnElementWithClass("name")

Much longer, but if you were to read the CSS selector to yourself, it would come out a lot like the ElementSpecification syntax. That allows you to stay focussed on what the test is doing instead of pausing to decode the CSS. It’s also reduces the likelihood of misreading a vital character and misunderstanding the selector.

With ElementSpecification essentially acting as an adapter layer between the test author and the actual CSS, it’s also able to avoid some common intermittency pitfalls. In fact, the reason ElementSpecification was first built was because really smart people kept attempting to locate an element with a classname using:

//*[contains(@class, 'valid')]

which looks ok, but incorrectly also matches an element with the class ‘invalid’. Requiring the class attribute to exactly match ‘valid’ is too brittle because it will fail if an additional class is added to the element. Instead, ElementSpecification would generate:

contains(concat(' ', @class, ' '), ' valid ')

which is decidedly unpleasant to have to write by hand.

The biggest benefit we’ve seen from ElementSpecification though is that fact that it has a deliberately limited set of abilities. You can only descend down the DOM tree, never back up and never across to siblings. That makes selectors far easier to understand and avoids a lot of unintended coupling between the tests and incidental properties of the DOM. Sometimes it means augmenting the DOM to make it more semantic – things like adding a “data-id” attribute to rows as in the example above. It’s surprisingly rare how often we need to do that and surprising how useful those extra semantics wind up being for a whole variety of reasons anyway.

Testing@LMAX – Replacements in DSL

Given our DSL makes heavy use of aliases, we often have to provide a way to include the real name or ID as part of some string. For example, an audit record for a new account might be:

Created account 127322 with username someUser123.

But in our acceptance test we’d create the user with:

registrationAPI.createUser("someUser");

someUser is just an alias, the DSL creates a unique username to use and the system assigns a unique account ID that the DSL keeps track of for us. So to write a test that the new account audit record is written correctly, we need a way to build the expected audit string.

Our initial attempt was straight forward enough:

adminAPI.verifyAuditEntry(
"expression: Created account <accountId> with username <username>.",
"account: someUser",
);

The DSL substitutes <accountId> and <username> with the real values for us. Simple, neat, worked well for quite a while. Except that over time we kept finding more and more things that needed to be substituted and encountered situations where an audit log would reference to two different accounts or multiple instruments, leading us to have <accountId2> and <accountId3> replacements.

So a little while back some of my clever colleagues changed things around to a slightly more complex but much more flexible syntax:

adminAPI.verifyAuditEntry(
"expression: Created account <accountId:someUser> with username <username:someUser>."
);

Essentially, the replacement now contains both the type of value to insert as well as the alias to look up. This has been a minor revolution for our DSL – it’s now much easier to handle all sorts of complex cases and it’s much clearer which alias is being used for each replacement.

The biggest win though, is that because all the required information to perform the expansions is inside the one string – not requiring any extra DSL parameters – the replacement algorithm can be easily shared across all the different places in the DSL that need to support replacements. New types of things to be replaced can easily be added in one place and are then available everyone as well.

It’s surprising how much benefit you can get from such a simple little change.

Testing@LMAX – Abstraction by DSL

At LMAX, all our acceptance tests are written using a high level DSL. This gives us two key advantages:

  • Tests can focus on what they’re testing with the details and mechanics hidden behind the DSL
  • When things change we can update the DSL implementation to match, avoiding the need to change each test that happens to touch on the area.

The DSL that LMAX uses is probably not what most people think of when hearing the term DSL – it doesn’t attempt to read like plain English, just simplifies things down significantly. We’ve actually open sourced the simple little library that is the entrance-way to what we think of as the DSL – creatively named simple-dsl. It’s essentially the glue between what we write in an acceptance test and the plain-java implementation behind that.

As a simple example here’s a test that creates a user, instrument and places an order:

registrationAPI.createUser("user");
adminAPI.marketOperations.createInstrument("instrument");
tradingAPI.login("user");
tradingAPI.placeOrder("instrument", "quantity: 5", "type: market",
"expectedStatus: UNMATCHED");

Overall, the aim is to have the acceptance tests written at a very high level – focussing on what should happen but leaving the how to the DSL implementation. The tradingAPI.placeOrder call is a good example of this, it’s testing that when the user places an order on an instrument with no liquidity, it won’t be matched. In the DSL that’s actually a two step process, first place the order and receive a synchronous OK response to say the order was received, then when the order reaches the matching engine an asynchronous event will be emitted to say the order was not matched. We could have made that two separate calls in the acceptance test but that would have exposed too much detail about how the system works when what we really care about is that the order is unfilled, how that’s reported is an implementation detail.

However that does mean that the implementation of the DSL an important part of the specification of the system. The acceptance tests express the user requirements and the DSL expresses the technical details of those requirements.

Model the Types of Users and Interactions

All our acceptance tests extends a base class, DslTestCase, that exposes a number of public variables that act as the entry points to the system (registrationAPI, adminAPI and tradingAPI in the example above). Each of these roughly represent a way that certain types of users interact with the system. So registrationAPI works with the API exposed by our registration gateway – the same APIs that our sign-up process on the website talks to.  adminAPI uses the same APIs our admin console talks to and tradingAPI is the API that both our UI uses and that many of our clients interact with directly.

We also have UI variants like adminUI and tradingUI that use selenium to open a browser and test the UI as well.

Our system tends to have a strong correlation between the type of user and the entry point to the system they use so our DSL mostly maps to the gateways into our system, but in other systems it may be more appropriate to focus more on the type of user regardless of what their entry point into the system is. Again the focus should be on what happens more than how. The way you categorise functions in the DSL should aid you in thinking that way.

That said, our top level DSL concepts aren’t entirely restricted to just the system entry point they model. For example the registrationAPI.createUser call in the example will initially talk to the system’s registration API, but since a new account isn’t very useful until it deposits funds, it then talks to the admin console to approve the registration and credit some funds into the users account. There’s a large dose of pragmatism involved in the DSL implementation with the goal being to make it easy to read and write the acceptance tests themselves and we’re willing to sacrifice a little design purity to get that (but only a little).

Top level concepts often further categorise the functionality they provide, for example our admin console that adminAPI drives has a lot of functionality and is used by a range of types of users, so it sub-categorises into things like marketOperations, customerServices, risk, etc.

Add Reusable Components to the DSL

One of the signs that people don’t understand the design of our DSL is when they extract repetitive pieces of tests into a private method within the test itself. On the surface this seems  like a reasonable idea, allowing that sequence of actions to be reused by multiple tests in the file. If the sequence is useful in many test cases within one file and significant enough to be worth the indirection of extracting a method it’s almost inevitably useful across many files.

Instead of extracting a private method, put reusable pieces into the DSL itself. Then they’ll be available to all your tests.  More importantly though, you can make that method fit into the DSL style properly – in our case, using simple-dsl to pass parameters instead of a fixed set of method parameters. 

One of our top level concepts in the DSL is ‘workflows’. It bundles together broader sequences of actions that cut across the boundaries of any one entrance point. It’s a handy home for many of the reusable functions we split out. The down side is it’s currently a real grab bag of random stuff and could do with some meaningful sub-categorisation. Naming is hard…

Design to Avoid Intermittency

The way the DSL is designed is a key weapon in the fight against intermittency. The first rule is to design each function to appear synchronous as much as possible. The LMAX Exchange is a highly asynchronous system design but our DSL hides that as much as possible.

The most useful pattern for this is that whenever you provide a setter-type function it should automatically wait and verify that the effect has been fully applied by checking the equivalent getter-type API. So the end of the DSL implementation for registrationAPI.createUser is a waiter that polls our broker service waiting for the account to actually show up there with the initial balance we credited. That way the test can carry on and place an order immediately without intermittently being rejected for a lack of funds.

The second key pattern applies when verifying values. We produce a lot of reports as CSV files so originally had DSL like:

adminAPI.finance.downloadCashflowReport("date: today", "rememberAs: myCsvFile");
adminAPI.finance.verifyCashflowReportContains("csvFile: myCsvFile", "amount: 10.00");

Apart from being pretty horrible to read, this leads to a lot of intermittency because our system doesn’t guarantee that cash flows will be recorded to the database immediately, it’s done asynchronously so is only guaranteed to happen within a reasonable time. Instead it’s much better to write:

adminAPI.finance.verifyCashflowReportContains("date: today", "amount: 10.00");

Then inside the DSL you can use a waiter to poll the cashflow CSV until it does contain the expected value or whatever you define as a reasonable time elapses and the test times out and fails. Again, having the test focus on what and the DSL dealing with how allows us to write better tests.

Don’t Get Too Fancy with the DSL

The first thought most people have when they see our DSL is that it could be so much better if we used static types and chained method calls to get the compiler to validate more stuff and have refactoring tools work well. It sounds like a great idea, our simple string based DSL seems far too primitive to work in practice but we’ve actually tried it the other way as well and it’s not as great as it sounds.

Inevitably when you try to make the DSL too much like English or try to get the compiler more involved you add quite a lot of complexity to the DSL implementation which makes it a lot harder to maintain so the cost of your acceptance tests goes up – exactly the opposite of what you were intending.

The trade offs will vary considerably depending on which language you’re using for your tests and the best style of DSL to create will vary significantly. I strongly suspect though that regardless of language the best DSL is a radically simple one, just that different things are radically simple in different languages.

DSL’s Matter

This was meant to be a quick article before getting on to what I really wanted to talk about but suddenly I’m 1500 words in and still haven’t discussed anything about the implementation side of the DSL.

It turns out that while our DSL might be simple and something we take for granted, it’s a huge part of what makes our acceptance tests easily maintainable instead of gradually becoming a huge time sink that prevents any change to the system. My intuition is that those people who have tried acceptance tests and found them too expensive to maintain have failed to find the right style of abstraction in the DSL they use, leaving their tests too focused on how instead of what.