Optimising Playwright tests in Ghost

I work at Ghost as a DevOps engineer. It's a great role because I get to work on many projects, diving deep into issues that other engineers don't have bandwidth for on the day-to-day.

Last year, I setup a basic framework for browser-based tests within the Ghost repo. Once it was all looking good, I created a couple of example tests and then the entire engineering department spent a week creating a comprehensive test suite in a concentrated effort to improve product quality and reduce the number of regressions. Overall the strategy has worked, we're testing the most critical paths within the product, the most complex use-cases and every regression we could find that required a patch release.

The Problem

For all the good that browser based tests do, the cost in runtime for CI has been high. The rest of our test suites have been through rounds of optimisation, and execute in around 7 minutes. By comparison, browser-based tests took around 20 minutes to execute. This high-cost meant we would only run those tests in CI when a commit was merged to the main branch, instead of running in pull requests, and it also meant that running the test suite locally was slow even on high-end laptops. On my M1 Pro Macbook Pro I could run the browser test suite in around 10-12 minutes, far too long to except every engineer to use it in their development cycle. The crux of the issue was that we couldn't test in parallel, since any test could remove or change data used in other tests.

Playwright is designed to have the base URL configured globally, and their guides on parallelisation use a single shared instance of a web app using multiple accounts within the app to ensure parallelism can work without overwriting data. This approach works for apps where accounts are entirely separate, but in Ghost the admin accounts are working with the same data & configuration. Complicating things further, the test suite includes cases which talk to Stripe, and rely on webhooks to be received by the local instance for the test to pass.

There are some optimisations that could be applied to the suite as-is – but they are bandages that could shave off seconds, when the real time-saves come from running more than one test at once.

Parallelising the test suite

The only approach that would enable parallelism is running an instance of Ghost for each Playwright worker. Discovering Playwright fixtures, it seemed like a good fit: I could create a fixture scoped to the worker, which runs automatically (instead of only when referenced by the test), and starts the instance of Ghost.

Getting Ghost running

Since we already had a way to start Ghost in the global setup for Playwright, I stripped that out as a first step to see if I could get a single Playwright worker using the fixture instead of the global setup file. By itself, this wouldn't enable parallelisation, but it would pave the way to make it possible.

The first issue was that the setup signed into Ghost, but Playwright expects to load the browser state from the same place, as defined in the global (or project-level) config. The solution was another fixture. In the worker-scoped fixture that I'd just setup, I saved the state into the fixture object:

base.test.extend({
  ghost: [async ({browser, port}, use, workerInfo) => {
    // setup omitted - creating a page object, setting up Ghost...
    const state = await page.context().storageState();
    await use({
      server,
      state
    });
    // teardown omitted...
  }, {scope: 'worker', auto: true}]
});

Then I overrode the built-in storageState fixture to rely on our custom one:

base.test.extend({
  storageState: async ({ghost}, use) => {
    await use(ghost.state);
  },
  // cont'd ...
});

As long as you remained consistent between the URL being used by the setup, and the one being passed to the test suite, any cookies from logging in would be used in the test cases.

After getting parity to the original global setup method, there were still 3 hurdles to getting multiple instances of Ghost running:

  • Using a separate port for each instance
  • Using a separate database for each instance
  • Setting up Stripe webhooks to point at each instance

Port-per-instance

To get each instance using a different port was the easiest part. In Ghost, we have a global configuration package called @tryghost/config. We also have a nifty way to override config at runtime for tests. As long as it's called before the config values are read, updating the port is as simple as:

configUtils.set('server:port', port);
configUtils.set('url', `http://127.0.0.1:${port}`);

This changes the port that Ghost listens on. I used some more fixtures to setup the port using the parallelIndex of the worker – this is an integer, starting from 1, which is unique to each worker. Unlike the workerIndex, if a worker is torn down and rebuilt (as happens if the worker fails a test), this will stay the same. It's ideal for the port number, since it will have been freed during the previous worker's shutdown.

base.test.extend({
  port: [async ({}, use, workerInfo) => {
    await use(2369 + workerInfo.parallelIndex);
  }, {scope: 'worker'}],

  baseURL: async ({port, baseURL}, use) => {
    const url = new URL(baseURL);
    url.port = port.toString();
    await use(url.toString());
  },
  
  // cont'd ...
});

Using multiple databases

Once the ports were working, setting up multiple databases should have been a cake-walk – a little more configuration overriding would set it up perfectly:

configUtils.set(
  'database:connection:filename',
  `/tmp/ghost-playwright.${Date.now()}.${workerInfo.workerIndex}.db`);

Setting this to something that was unique for each worker, and guaranteed not to be used by future test runs, meant that we could leave the database in any state we wanted at the end of the test run. Ideally we want the tests to run in any order without issues, but this saves us from having to tidy up excess lists of posts, pages, tiers and offers – which could get confusing or collide with subsequent test runs if we re-used the database.

Unfortunately, this wasn't enough. The fixture file had grown significantly, requiring parts of Ghost's internals to patch out functionality. Part of this setup code meant that the database had already loaded, meaning that any changes to the config file were not being picked up. Whilst it was simple to patch, it was hard to debug exactly how this was happening. There's no great technical solution here – just changing where files were required within the fixture, but it might serve as a cautionary tale to anyone else trying to accomplish a similar goal. Once the order was changed, each instance of Ghost used its own database just fine.

Fixing Stripe webhooks

By this point, many tests ran without issue. I had already achieved a simple method of pointing the Stripe webhooks at each instance of Ghost, by moving the code that started the Stripe CLI into the new fixture. But when the Stripe tests ran in parallel they seemed to trip each other up, despite being different instances of Ghost, on different ports, with separate databases – the Stripe webhooks were received by all instances, which was enough to interfere with the test runs.

Instead of requiring each developer to create 5~ separate Stripe accounts, and pass 5 sets of API keys to the browser tests, I decided to try using Stripe Connect. It's the feature of Stripe that allows an account to have many sub-accounts, each operating independently. Once it's setup in the web UI of Stripe, the test API key can be used to create Connect accounts. Each worker looks for an account with its own fake email address, deletes it if it already exists to start with a clean slate, and creates a new one.

As a final hurdle, you cannot use the Stripe account-level API key to get the API key for the Stripe Connect account, which meant that the accounts were setup separately, but I didn't have any way to limit which webhooks were received by each instance of Ghost. However, since the webhooks were sent from Stripe Connect, they had the account id attached to them, so one last patch made it so that each instance of Ghost would only process the webhooks that were destined for its Stripe Connect account:

const WebhookManager = require('../../../../stripe/lib/WebhookManager');
const originalParseWebhook = WebhookManager.prototype.parseWebhook;
const sandbox = sinon.createSandbox();
sandbox.stub(WebhookManager.prototype, 'parseWebhook').callsFake(function (body, signature) {
  const parsedBody = JSON.parse(body);
  if (!('account' in parsedBody)) {
    throw new Error('Webhook without account');
  } else if (parsedBody.account !== stripeAccountId) {
    throw new Error('Webhook for wrong account');
  } else {
    return originalParseWebhook.call(this, body, signature);
  }
});

I've glossed over a lot of the complexities, and the less interesting issues that I encountered, but I think this summarises the difficulties in running multiple instances of a complex application like Ghost within a framework like Playwright. The effort has been worthwhile, since the CI time for browser tests has been 18-20 minutes down to 7 minutes – if we need to run them faster then we can simply scale to a Github runner with more cores, this just means that browser tests are no longer the blocker when CI runs. When I run the tests on my local machine, they take barely more than 2 minutes.

If you're interested in all the details, the new test fixture is here.


Does this sound like the sort of thing you'd like to work on? Ghost is hiring for DevOps engineers! The full job description & application form can be found here:

DevOps Engineer - Ghost
We’re looking for talented DevOps engineers to improve our ability to deliver new features and a great service to our customers via the Ghost(Pro) platform 🚀

👋