"Our tests are flaky. We're shopping for a better test runner."

No, you're not looking at a runner problem. You're looking at a mirror.

In 20 years of being handed "flaky" suites, I've almost never found the flakiness living in the tooling. It lived in the system under test — and the team had quietly agreed not to look at it. A test that passes on Tuesday and fails on Wednesday isn't broken. It's often the most honest thing in the building.

Here's the heresy: flakiness is non-determinism, and you built it in. Shared state nobody resets. A sleep() standing in for a real wait. Tests that depend on the order they happen to run in, on the wall clock, or on a network call to something held together with hope. Martin Fowler made the point years ago — a non-deterministic test is worse than no test, because it trains the whole team to shrug at a red build.

Swapping the runner to fix that is like rotating the tyres because the engine knocks. New kit, same noise, lighter wallet.

What actually fixes it:

  • Treat each flaky test as a bug report about your design, not a nuisance to retry until green.
  • Kill shared mutable state between tests — fresh fixtures, nothing left over from the last run.
  • Replace every sleep() with an explicit wait on the real condition.
  • Make each test own its data, so they stop fighting each other over the same row.
  • Quarantine the offender loudly — with an owner and a ticket, not the "muted temporarily since 2023" graveyard.

A "retry up to 3 times" flag isn't a fix. It's a way to launder a known defect into a green tick and call it a pipeline.

Flaky tests aren't lying to you. They're the one part of the system still telling the truth.

What's the flaky test your team keeps re-running instead of reading?


Talk to a senior QA consultant
Ready to build quality
into your process?

A QA Health Check audit finds the gaps in 1–2 weeks. From £1,200.

Book a free call