Where Benchmarks Go Wrong
There is no better vehicle for grabbing headlines, standing out in a crowded content landscape, or for unlocking a boardroom than benchmarking. A great benchmark can provide an unbiased view into the state of the industry, which casts a halo of credibility and authority onto the insight’s publisher.
And, with the value of data trading at an all-time high, more and more technology providers - especially those populating the eCommerce tech stack - are turning to benchmarks to flex their authority.
But, so many of those well-intentioned and promising benchmark studies, reports, social posts, and other content vehicles fail to reach their potential. After all that work to collect, analyze, interpret, and hype the launch, they fall flat.
So, what separates the good from the bad benchmarks?
I’ve been building benchmarks for the last ten years, and over the past 50+ benchmark report releases and hundreds of benchmark briefings, I’ve seen (and repaired) a lot of broken benchmarks. And the issue is not what you might expect. It’s typically not bad data analyses, or even confusing or bad data visualizations (I’m looking at you, pie chart). And, it’s not just one thing.
Here are the 5 common, and rather egregious, mistakes that break benchmarks, and how to make the most of your benchmark data.
Mistake #1: One for All Mistake #2: Comparing Apples to Appleseeds Mistake #3: Unlawful Averages Mistake #4: Prescriptive Guidance Mistake #5: Data without a Problem
These mistakes compound -- focus on the wrong story, insight, or benchmark, and you lose credibility. Read on to see how to spot, and fix these problems.
Mistake #1: One for All
The central aim of benchmark data is to answer an important, two-part question: How am I doing, and what should I do next?
Directional guidance is a benchmark’s raison d’etre. But, to provide guidance, those benchmarks must be relevant. And, to be relevant, benchmarks require focus and specificity. Instead, the mistake many make is casting too wide of a comparison set. In short, when you can’t see yourself in the analysis set, it is not relevant. Imagine comparing a newborn’s height and weight to that of a teenager. It is an irrelevant comparison - and would be quickly dismissed. The same holds when applied to digital metrics. Worse, providers that make an irrelevant comparison like that lose credibility. If the comparison set is too diverse, it is irrelevant to all.
One key distinction that renders many benchmarks confusing is the mixing of industries or verticals. For instance, a benchmark for cost of acquisition for a quick service restaurant is going to be very different from a b2b software firm.
In their Q2 2022 Influencer Marketing Benchmark Report that highlights social cost per engagement, MVCRK shows this dynamic. With a diverse set of industries present, it is imperative to show how expectations vary by vertical. Service-based industries (professional services and healthcare) see the highest cost per engagement, as the data viz presents clearly. Without the industry distinction, a simple average would be $.38. That provides little to no direction for ANY of the verticals – the service-based industries would see that as way too low, while the retail and consumer goods brands would find that too high.
Beware of the ‘one for all’ approach to benchmarking and find ways to segment your data appropriately.
Mistake #2: Comparing Apples to Appleseeds
Introducing bias into benchmarks is easy. Take growth, which can be tricky to benchmark. Since growth is measured over a time period, keeping a consistent analysis set is required. This can present a challenge because you may have more samples at different times. In short, you have a moving target. If your analysis includes all of the samples, including those that were added during your analysis period, you’re likely to introduce a sampling bias due to lifecycle factors. This is one of the most frequently appearing biases that you’ll find in SaaS. You’ll often see these in ‘momentum’ press releases, or even in investor updates. While these content pieces may not claim to be industry benchmarks, readers may read them as such.
For example, in Shopify’s Q2 2022 investor update, they share this:
Gross Merchandise Volume ("GMV") for the second quarter was $46.9 billion, which represents … 11% (growth) over the second quarter of 2021.
The words are right here – Shopify GMV was up 11% YoY. A sampling bias can be introduced IF the reader takes this to be an industry figure of ‘11% eCommerce growth.’ Why is it not? Shopify - like other providers - can launch or churn new merchants regularly. For providers that are growing, they generally will onboard more new merchants than they will churn. And, the contributions from the new merchants will artificially inflate a growth value. What we don’t know from the reported figures is the growth rate of the consistent merchants - those that were transacting during the entirety of the analysis period - during the period.
To overcome this sampling bias and lifecycle skew, develop a ‘same-set-activity’ that will provide the foundation of credible and representative claims. Retail has long since provided this appropriate guidance, reporting on 'same-store-sales.' In digital we’ve adapted this to be ‘same-site-sales.’ Whether the measure is about sales or some other behavioral activity, the mandate is for consistency. Make sure your analysis set is appropriately drawn to the time period, and you aren’t introducing a sampling bias from lifecycle factors.
Looking to make sure a benchmark is drawing an appropriate analysis set? Analysis set details are found - often buried - within the methodology. Good benchmarks will establish credibility by placing some of it early, alongside those ‘mass metrics’ (the large numbers that make the set of data appear really impressive). Here’s a quick example methodology from the Salesforce Shopping Index
There is far more to analysis sets than same-sites, of course. Sets that are too concentrated can skew the data severely, and create credibility and even privacy issues. More on concentration risk is below.
Mistake #3: Unlawful Averages. Benchmarks are often represented by a series of averages. But averages carry a lot of baggage, and can be confusing, misleading, and even alienating.
What do you Mean (ha!)? An average can be confusing - is it a simple average of a metric for a set of samples (arithmetic mean) or an average of the shopping activity across a set of websites (weighted average)? Both can be right, but readers may be unclear about what you’re averaging. And, if weighting averages, you run the risk of concentration risk - case in point: back in 2021, Affirm revealed that Peloton represented 31% of their revenues. If Affirm were to benchmark its customer’s performance using a weighted average, Peloton’s performance would make up about 1/3rd of the set.*
Sometimes, averages can be misleading; the samples in the analysis set are disproportionately distributed towards the high or low end of a metric (skewed left or skewed right). And, analyses that include highly concentrated or ‘heavy’ samples that are skewed towards either the low or high-end for a metric also carry sustainability risk. If those heavy samples are removed from a set, the values can be drastically altered. This issue presents itself when a report is compared against a prior version of the report, and the metric(s) look wildly different
Worse, though, averages can alienate your audience. Just ask sales. Any good salesperson knows that if you share an average of performance that is (or can be) taken to be representative of your solution, prospects will naturally compare their performance to your performance. This happens frequently with established metrics like conversion rate. Sharing an average conversion rate that is below a prospect’s current rate can create a negative reaction.
And, if that particular audience already converts at or above that rate, you may have inadvertently turned them off by implying that sites on your platform perform worse than the prospect’s current performance.
Publishing ONLY the average creates a risk of alienating your prospects, and, sometimes misdirecting.
A good benchmark shares more than just the average. How are the leaders doing? How wide of a range is there between the best and the rest?
Bold Commerce strikes this balance quite well in the Checkout Benchmark, specifically when sharing benchmarks for checkout completion rate:
Benchmarking average performance alongside the range of performance for the best v. the rest, Bold provides a more comprehensive view of checkout performance.
To enhance your credibility and authority, provide a more complete view of your performance.
Mistake #4: Prescriptive Guidance
‘I want to show that <our solution> improved our customers' results.’ It’s nirvana for some firms - the claim they desperately want to make. If we can prove that ‘just adding us’ yields incredible ROI, then we’ve reached the peak, right? Wrong.
Why are these aggregate claims a mistake to chase?
First - comparing ‘before you’ to ‘after you,’ in aggregate - is very difficult. Sure, there are ways to collect performance data from a time before a customer uses your platform. However, your customers are constantly adding technology, developing campaigns, and changing how they do business. Taking full credit for a customer’s gains is a massive overreach. With all the complexities and nuances of their business, it’s simply not feasible to lay claim to their gains in such a way. And - worth mentioning - when YOU make these claims about yourself, it’s very difficult for prospects, and the industry at large to accept them at face value. You are a subjective source.
Looking to flex how your solution is winning for your customers? Case studies are a fantastic proof point for showing that customers see benefits from your solution and using their words and their stamp of approval oozes credibility.
So, if not trying to be prescriptive in benchmarking, why do it? Let’s remember that critical question: ‘how am I doing?’ When you answer this question you provide directional guidance. A good benchmark - as we saw above - is relevant AND directional. ‘Compared to omnichannel beauty brands operating in the US, your product pages include far less user-generated content.’ Let’s dissect that briefly:
[[ Compared to]] == establishes the intent. How do you compare? [[ omnichannel beauty brands operating in the US ]] == a relevant comparison using industry, business model, and geography [[ (your) product pages ]] == a focused, clearly defined attribute [[ includes far less user-generated content ]] == a clear result
This sums it all up - relevance, a clear attribute, and a clear result. Sure, even this example can go deeper:
More relevance: what class of beauty brands Luxury? value?
Where on the product page? Above the fold, overall?
What type of user-generated content? Reviews? Social posts?
How ‘far less’? Is it close?
Benchmarks are a compass, not GPS. Directional guidance can help quantify how an aggregate set of customers have benefited. YOTTAA provides this guidance very well.
Analyzing the shopping activity across 16 billion page views, YOTTAA observes how fast pages load, and, amongst other things, the ‘weight’ of a page - including the number of third-party technologies and other resources used to load the pages. Not surprisingly, the heavier the page, the longer it takes to load the page.
Directional guidance cedes the reality and nuances in enterprise software: there are far too many complexities to claim that there is one solution to rule them all. And if you come across these aggregate claims, stand back and think like a prospect: Is this really believable? And, of course, there’s the finicky little bit of attribution. Benchmarks can not serve prescriptive guidance.
Mistake #5: Data without a Problem
The last - and by far the worst mistake across all benchmark reports - is to sell the data and not the problem. A bad benchmark is a hollow one.
You know hollow benchmarks. They look great on a slide or social card, but there is nothing inside. No further insight, no deeper story. These are the empty calories of content marketing and are contributing to toxic levels of content pollution.
How do you know if a benchmark is hollow? Here’s a trick - does the headline above the data visualization claim something OR does it simply describe the chart below it?
More broadly, benchmarks are not merely a neat chart or fancy visual. Benchmarks provide proof. Benchmarks turn fantasy into fact. Like a great drum track serves the song, great benchmarks need to serve a story. Stories resonate with readers. Stories with proof help readers consider - and re-prioritize - the importance of the story and how it relates to their needs. When you weave benchmarks into your stories, you stir a reader to take action.
Why does this metric matter? Why is it more meaningful today? What else do you need to look at now that you know something? Benchmarks should be the evidence in a documentary about the industry.
Botify’s Organic Search Standard shows how to sell the problem AND what to then do next. Organic search is a top marketing channel and is responsible for driving droves of traffic to sites. But, Google can’t crawl every single page of a site.
This means retailers must prioritize which pages to crawl. Understanding which pages deliver traffic helps define priority. And, comparing two of the most important page types - category and product detail - shows which pages are contributing the most:
The problem is clear – crawls are limited, and you need crawls to drive traffic. Following the problem, with a solution - focus on the right pages - provides direction, and lends credibility to the publisher, Botify.
Similarly, YOTTAA’s ‘Wait of the World’ spells this out nicely. When your solution focuses on improving the operational level of your site, you better be able to tell a story that resonates with the business and the technical teams. Why does speed matter? In the story, YOTTAA paints ‘slow’ as the villain, and quantifies the impact of slow page loads directly to a negative outcome for shoppers - a slow experience, and worse, a bounce. The benchmarks are merely the proof that makes the story relevant to the reader and helps boost the importance of their solution.
Great benchmarks don’t just lob data into a pdf - they sell a problem, and, share a path to improvement
The fact that you are turning to the data is a smart step forward. But, with great data comes great responsibility. By avoiding the mistakes above,
Some of these are easy to avoid. Others take a softer touch or smarter analyses. With lots of data comes benchmark responsibility – share the benchmarks, but make them compelling.