Testing@LMAX – Testing in Live

By Adrian Sutton

April 14, 2014

Previously in the Testing@LMAX series I’ve mentioned the way we’ve provided isolation between tests, allowing us to run them in parallel. That isolation extends all the way up to supporting a multi-tenancy module called venues which allows us to essentially run multiple, functionally separate exchanges on a single deployment of the LMAX Exchange.

We use the isolation of venues to reduce the amount of hardware we need to run our three separate liquidity pools (LMAX Professional, LMAX Institutional and LMAX Interbank), but that’s not all. We actually use the isolation venues provide to extend our testing all the way into production.

We have a subset of our acceptance tests which, using venues, are run against the exchange as it is deployed in production, using the same APIs our clients, MTF members and internal staff would use to test that the exchange is fully functional. We have an additional venue on each deployment of the exchange that is used to run these tests. The tests connect to the exchange via the same gateways as our clients (FIX, web, etc) and place real trades that match using the exact same systems and code paths as in the “real” venues. Code-wise there’s nothing special about the test venue, it just so happens that the only external parties that ever connect to it are our testing framework.

We don’t run our full suite of acceptance tests against the live exchange due to the time that would take and to ensure that we don’t affect the performance or latency of the exchange. Plus, we already know the code works correctly because it’s already run through continuous integration. Testing in live is focussed on verifying that the various components of the exchange are hooked up correctly and that the deployment process worked correctly. As such we’ve selected a subset of our tests that exercise the key functions of each of the services that make up the exchange. This includes things like testing that an MTF member can connect and provide prices, that clients can connect via either FIX or web and place orders that match against those prices and that the activity in the exchange is reported out correctly via trade reporting and market data feeds.

We run testing in live as an automated step at the start of our release, prior to making any changes, and again at the end of the release to ensure the release worked properly. If testing in live fails we roll back the release. We also run it automatically throughout the day as one part of our monitoring system, and it is also run manually whenever manual work is done or whenever there is any concern for how the exchange is functioning.

While we have quite a lot of other monitoring systems, the ability to run active monitoring like this against the production exchange, and go as far as actions that change state gives us a significant boost in confidence that everything is working as it should, and helps isolate problems more quickly when things aren’t.