Randomizing key steps during experimentation for unbiased results

Aside from controlling variables, randomization is likely the most forgotten, most important, and easiest to implement concept related to Experimentation.
People have a hard time predicting or emulating random events. This is part of the reason that games such as poker, dice or the ancient game of throwing astragali¹ have been so popular throughout human history.

These games are appealing in part because of their reliance on unfolding random occurrences. The games are not entirely random but are randomized in key ways to make gameplay unpredictable. Similarly, experimenters may find themselves running randomized experiments that are not entirely random – for better or worse.
Simple precautions can ensure an experiment is effectively randomized.
In scenarios where you know your participants ahead of time, such as in an email messaging experiment, you can enter your participants – by name, ID or email address – into an excel data table and assign a random number between 0 and 1 to each participant.
If you’re interested in testing a new version of an email message, you can compare the new and old message variants against each other using two groups – participants with randomly generated numbers below the median and those above (or equal to) the median. By randomly sending one email, either the new or old variant, to each group above or below the median, you are able to counterbalance most confounding and covarying factors across experimental groups.
This random assignment of participants to experimental (and control) groups is useful in controlling for unknown, unobservable and random variables.
For example, relating back to our bartending scenario, imagine your friends come over for a second night of drinking.
How do you serve them to best learn which one of four drinks reigns supreme?
Intuition may be to serve four drinks to each of your friends. And for some instances, such as throwing a party, this may be appropriate. But there are costs associated with this approach. Besides being monetarily costly, order of exposure may affect response data. Some drinks are better by the second round, and who knows how your response data will look after four rounds of drinks.

The kind of experimental design where stimuli are evaluated in succession by the same respondent is referred to as a sequential monadic design. These kinds of experiments or Surveys generally do not require as many participants and can reduce costs when increasing the number of participants is more costly than increasing the number of stimuli each participant experiences. However, as mentioned before, this design may introduce order bias and generally increases the time requirement for each participant, which may affect the quality of response data, as in our party scenario.
To address order bias resulting from participants’ exposure to previous stimuli during the course of the experiment, it’s generally a good idea to randomize sequences including questions in a survey or stimuli in experimental blocks. The order of blocks of questions or stimuli should also be randomized if possible. Sometimes there is a logical flow, as is often the case in surveys, and complete randomization would impair comprehension of the survey. When designing an experiment or survey, considerations can be made on whether certain questions, stimuli or blocks can be randomized without detracting from the user experience.
By randomizing stimuli or block sequence, not only is order bias mitigated but also experimenters are able to randomly assign participants to experience a subset of stimuli in a random order. This approach is also referred to as a sequential monadic design. For our bartending example, this could mean providing houseguests with two drinks randomly – likely to yield higher quality data and may also contribute to a less rowdy party atmosphere.
However, to produce a comparable amount of data related to each drink, this method of providing houseguests with two out of four drinks at random will require about twice as many guests.²
A third approach would be to provide your houseguests one drink each. This is referred to as a monadic design. Here, all participants or a subset of participants are exposed to a single concept or experimental stimuli of a certain type. Notably, although each houseguest may provide data around only a single drink, multiple questions regarding that drink may be asked. Similarly, although participants may be exposed to only a single concept in a monadic survey design, surveyors may ask multiple questions regarding that concept. Further, you are able to make comparisons between responses across concepts if respondents are representative of the same target population.
So your friends are coming over for another night of drinks, and you can comfortably serve them whichever amount of drinks they prefer.
What’s the problem?
Well, although this setup is probably about right for hosting guests generally, for the purposes of your experiment to uncover the Drink of the Gods, you’re going to have to do better.
The remaining problem relates to participant sampling. Briefly, the goal of sampling is to select a representative group or sample from the target population under study. If the target population is your drinking friends – since this is the relevant reference network containing people whose opinions matter most to you, at least regarding your mystery cocktail – then your sample should be representative of your drinking friends.
Experimenting in this context may entail 1) messaging all your drinking friends and inviting them over or 2) representatively sampling by randomly choosing a subset of your drinking friends to invite over. If you have too many friends to make drinks for everyone, you could break out the astragali or dice and randomize participant sampling that way – it would be a random approach. More easily, you could rely on Excel randomly generated numbers similarly to the process for randomizing assignment in an email experiment.
Still, it is important not to conduct your drinking experiment with only the friends who initially or first respond to your text message – this is convenience sampling where your sample is drawn from individuals that are easiest to reach and thus inherently not random. For example, if you are randomly selected to participate in the U.S. Census but fail to respond, the Census Bureau follows up with you in order to collect the maximum number of responses from their intended sample to maintain representativeness with the target population being the population of the United States.
At the expense of breaking the cardinal rule, the rules here are slightly reminiscent of those in Fight Club – everyone fights.³
Anybody included in your sample or target population of drinking friends should be able to drink if assigned to an experimental drinking condition.⁴ Conversely, any of your drinking friends must also be willing to forego drinking in the event they are randomly assigned to a baseline (sober) control condition. Otherwise, uneven participation may introduce selection bias, e.g., imagine if only your friends preferring lemon agreed to drink lemon cocktails – you could misinterpret results indicating that lemon was the highest rated based on the biased selection of participants recruited for and completing the experiment.
So, equipped with knowledge around randomization, you’re prepared to conduct experiments on your email contacts and friends or at least host a very random party.
[1]: Mlodinow, L. (2009). The drunkard’s walk: how randomness rules our lives. People have been interested with randomization and decision making under uncertain probabilities since before the theory of probability. For thousands of years the Ancient Greeks both made decisions and gambled by throwing astragalus bones or astragali, similar to modern dice, and largely believed the outcome to be a result of divine intervention. The Roman statesman Cicero attributed the outcome of astragli throws to luck rather than divine intervention and thus propelled the examination of randomness through his coining of the term probabilis, the forerunner to modern probability.
[2]: Not quite twice as many houseguest participants are required. The control group – never exposed to any drinks – is not required to increase in size to maintain equivalent statistical power.
[3]: Palahniuk, C. (2005). Fight club. The final rule of Fight Club is actually, "if this is your first night at fight club, you have to fight." In our party example, it’s everyone’s second night, but it’s still important that participants are capable of participating in the conditions to which they are randomly assigned.
[4]: Technically that’s not true. Often in medical research studies such as in epidemiology, statistical analyses can be performed based on intention-to-treat, meaning whether or not doctors intended to treat patients regardless of patients’ adherence to their treatment regimen. This approach works although decreases observable effect sizes, meaning larger participant sample sizes may be required to detect meaningful differences across conditions.