Ratio - Header

Optimising your Experimentation program


There’s a natural tendency to focus on the outputs of Optimisation. Metrics like conversion rate, AOV, CLV & Customer satisfaction help marketers to understand the extent to which their efforts have created value for the business. Often times these same metrics form the basis for our own KPI’s. But these outputs are of limited value in understanding why the program of work has or has not delivered to expectations, and this is the understanding that needs to be gained to maximise the results delivered.

To really understand what’s working and why, we need to look at the inputs.

Consider a typical eCommerce website with ‘Category’ pages, ‘Product’ pages, a cart and a checkout. Knowing the rate at which users convert from category pages to transaction does not help us to understand where within the funnel resources should be invested to improve the effectiveness of the overall system. To make good decisions, we would need to look at the performance of the component parts – at what rate do users from category pages to product pages? From cart to checkout? Etc.

This kind of thinking translates to your optimisation program itself. Sure, the end result is what matters. But there are multiple somewhat independent factors that should be examined to determine why we got the result we did.

Measure how many experiments  you’re running: “Experiment velocity”

Simplistically, experimentation velocity is the number of experiments you’ve launched  per unit of time. This period of time is not set in stone and should be selected based on your organisational context. If there is a lot of red tape to get experiments live, quarterly might make sense. If you have a lot of flexibility, monthly might be a better time period.

Whatever time period you select, you should trend this over time. Make it part of your scheduled retrospectives to look at this and try to unpick why it has or has not improved.

Measure how many variations you’re testing per Experiment: “Avg Variations per experiment”

As well as maximising the number of experiments you’re running, if your traffic allows you should always try to test more than just 1 variation in each experiment. Try to make one or more of your variations bold or risky changes rather than small tweaks. Keep in mind that most of the time, only significant changes to a users experience will drive significant changes in their behaviour.

Optimizely have conducted some very interesting research here, and found that the win rate for experiments with 2 variations is just 25%, but the win rate for experiments with 5 variations is almost twice that at 44%! They also found that experiments with more variations are more likely to produce a losing variation. While that kind of stands to reason, losses can provide invaluable opportunities to learn and are not inherently negative.

 Measure how many experiments you could be running: “Experiment capacity”

The reality is that even with unlimited resources at your disposal, there’s a limit to the number of experiments you can run in a given period of time – this being a function of the number of ‘testable journeys’ on your website and the amount of time it takes to get a winning result from one of these journeys.

The calculation here is ((Weeks in time period) / (Average experiment duration in weeks)) * # of simultaneously testable journey = experiment capacity over that time period

While of course simply seeing a trend in your experiment velocity is helpful, experiment capacity gives you the context to know how much headroom you still have left. If your website can support 3 tests a month and you’re running 2, that will lead to an entirely different set of decisions than finding that your website could support 30 tests a month and you’ve only run 1.

Measure the % of experiments that produced a valuable outcome: “Experiment effectiveness”

The wording is important here – it’s short sighted only to look at the percentage of your experiments that win, although wins are obviously great. As the old adage goes, you might learn even more from a failure – so wins are not the only way an experiment result could be valuable.

This is particularly the case in organisations that are just starting on their journey to develop a test and learn culture. Imagine that perhaps you’ve pushed to test a homepage redesign initiated by your CEO, when historically this sort of change would just have been launched ‘cold’. In this scenario, a losing experiment is actually very valuable to your organisation.

There are a few metrics you could choose to look at dependent on your organisational context, for example:

  • What % of tests win, lose or are inconclusive?
  • What % of losing tests produce a meaningful insight


Measure how many days you weren’t running anything: “Zero test days”

If your ultimate aspiration should be to constantly have your velocity match your experiment capacity, the inverse of this – what you should try to avoid at all costs - is days where you are not running any experiments at all. Any visitor not exposed to an experiment should be viewed as a missed opportunity that you will never get back!

My suggestion is having a tracker visible to all in the experimentation project team which shows the number of days in a given time period that had no experiments running. You might also consider displaying the number of current days without an experiment.

 Measure the process itself and find the pain points: “Avg time per process step”

When you begin your experimentation program you should be developing an experimentation charter. As part of this, you will be documenting the process an experiment goes through, all the way from ideation to post test analysis. For each of thee steps your charter will nominate a responsible timeframe and the ‘SLA’ within which they are committing to completing their piece of the work.

Ideally too, you’ll be using some sort of project management tool like Jira or even Trello to track the progress of each experiment as it moves through the progress. This will allow you to track how long each step actually took vs expectations.

As part of your regular retrospective, use this comparison to have a dialogue around the parts of your process that have or have not performed to expectations. Try to uncover why this is the case and make decisions around how you can mitigate against that risk in the future – maybe additional resources are required, or you need to better anticipate and communicate upcoming work to that stakeholder.