Comment by awkward
Comment by awkward 6 days ago
Pure, disinterested A/B testing where the goal is just to find the good way to do it, and there's enough leverage and traffic that funding that A/B testing is worthwhile is rare.
More frequently, A/B testing is a political technology that allows teams to move forward with changes to core, vital services of a site or app. By putting a new change behind an A/B test, the team technically derisks the change, by allowing it to be undone rapidly, and politically derisks the change, by tying it's deployment to rigorous testing that proves it at least does no harm to the existing process before applying it to all users. The change was judged to be valuable when development effort went into it, whether for technical, branding or other reasons.
In short, not many people want to funnel users through N code paths with slightly different behaviors, because not many people have a ton of users, a ton of engineering capacity, and a ton of potential upside from marginal improvements. Two path tests solve the more common problem of wanting to make major changes to critical workflows without killing the platform.
> politically derisks the change, by tying it's deployment to rigorous testing that proves it at least does no harm to the existing process before applying it to all users.
I just want to drop here the anecdata that I've worked for a total of about 10 years in startups that proudly call themselves "data-driven" and which worshipped "A/B testing." One of them hired a data science team which actually did some decently rigorous analysis on our tests and advised things like when we had achieved statistical significance, how many impressions we needed to have, etc. The other did not and just had someone looking at very simple comparisons in Optimizely.
In both cases, the influential management people who ultimately owned the decisions would simply rig every "test" to fit the story they already believed, by doing things like running the test until the results looked "positive" but not until it was statistically significant. Or, by measuring several metrics and deciding later on to make the decision based on whichever one was positive [at the time]. Or, by skipping testing entirely and saying we'd just "used a pre/post comparison" to prove it out. Or even by just dismissing a 'failure,' saying we would do it anyway because it's foundational to X, Y, and Z which really will improve (insert metric) The funny part is that none of these people thought they were playing dirty, they believed that they were making their decisions scientifically!
Basically, I suspect a lot of small and medium companies say they do "A/B testing" and are "data-driven" when really they're just using slightly fancy feature flags and relying on some director's gut feelings.