Disaster-proof A/B Testing, Pt. 2: Ensuring a consistent experience (on 750+ arms)
Brian Campbell
Reading time: about 4 min
Topics:
function getArm(userId, testName) {
var arm = getArmFromServer(userId, testName);
//Wait for a network request that includes a database lookup every time
//On a bad day this could take a while
return arm;
}
Instead, we need a way to quickly decide what arm of a test a user should be on based on information we already have. In this case, we’ll use the user’s id, which we would have already loaded into our client.
To decide what arm of a test the user should be on, we take their user id, concatenate it to the name of A/B test, generate the md5 hash of the string, and convert it into the number. We then mod that number by 100 and compare it to the percentages in each arm of the test. For example, if we are running an evenly split two-armed test, the percentages would be 50-50. If the result of the mod is less than 50, the user would be assigned to the A arm, and if it is greater than or equal to 50, the user would be assigned the B arm.
The code below shows how to do this generally. By doing this, the user is kept on the same arm for the duration of the test and requires no additional storage. However, this assignment doesn’t look very random, but we’ve done some empirical work to prove that it is.
function getArm(userId, testConfig) {
//testConfig contains the name of the test and a list of arms with their weights
// for example {name: 'amazingTest', arms: {'T-A': 1, 'T-B': 1}}
var totalWeight = 0;
Object.keys(testConfig.arms).forEach(function(arm) {
totalWeight += testConfig.arms[arm];
});
var idForTesting = userId + '|' + testConfig.name;
// There are many methods and libraries for hashing a string
// and then turning it into a javascript number
// so I won't go into the details here
var hashSum = getLongFromHash(idForTesting);
var assignedValue = hashSum % totalWeight;
var weightLeft = assignedValue;
var keys = Object.keys(testConfig.arms);
for (var i = 0; i < keys.length; i++) {
var arm = keys[i];
var weight = testConfig.arms[option];
weightLeft -= weight;
if (weightLeft < 0) {
return arm;
}
}
}
To show that our test assignment was random, we ran a test with 750 different arms. The hypothesis was that if assignments were random, then there would be an even number of people in each arm and each arm would act roughly the same (there is no significant difference in how often one arm, registered, created documents, paid, etc).
We’ve let the test run for over six months, occasionally checking if the behavior has changed, and over 2.4 million users have been assigned an arm. There is a difference of less than 300 people in the arm with the most users and the arm with the least users, and no significant difference in behavior between arms. It is effectively random enough to form the backbone of an A/B test system.
With this framework in place, there’s virtually no limit to the number of arms our tests can have nor to the number of tests we can run at a time.
About Lucid
Lucid Software is a pioneer and leader in visual collaboration dedicated to helping teams build the future. With its products—Lucidchart, Lucidspark, and Lucidscale—teams are supported from ideation to execution and are empowered to align around a shared vision, clarify complexity, and collaborate visually, no matter where they are. Lucid is proud to serve top businesses around the world, including customers such as Google, GE, and NBC Universal, and 99% of the Fortune 500. Lucid partners with industry leaders, including Google, Atlassian, and Microsoft. Since its founding, Lucid has received numerous awards for its products, business, and workplace culture. For more information, visit lucid.co.