How many samples do you need?

Simulation in R

Published in

Towards Data Science

4 min readSep 30, 2021

In aquaculture, experiments are conducted in tanks containing fish. These fish are measured individually which is a tedious and stressful endeavor. I was asked to calculate how many fish one would need to sample in order to maintain a power equivalent to a complete sample experiment.

To detect an overall effect, I used the F-value from an ANOVA. Any p-value obtained from such a model shows if, in an experiment containing multiple treatments, there is any ‘significant’ difference between any of the treatments. The limit for ‘significance’ if often set at 0.05, which means we accept a 5% Type 1 Error or False Positive Rate of 1 out of every 20 replicated trials (not diving to deep into Frequentist Theory here).

To try and answer the original question, I received a dataset containing observations from individual fish per tank across treatments. From Frequentist theory we know that the power to obtain a significant p-value — if one truly exists — is dependent on the mean, the standard deviation, and the sample size. The mean and the standard deviation are used to calculate the coefficient of variation within a treatment, and effect size between treatments. In this example, the effect size is the difference between treatments, corrected for by the standard deviation.

For this exercise I used the R programming language, but it can be easily replicated in either SAS, Python or any other programming language of choice.

Lets load in the libraries I think I will need.

Then, lets import the data and take a look at it.

Here, you can see the spread of data by treatment and by tank. The code below shows that each tank contains 35 observations.

# A tibble: 12 x 2
   Tank  no_rows
   <fct>   <int>
 1 120        35
 2 121        35
 3 122        35
 4 123        35
 5 125        35
 6 126        35
 7 127        35
 8 128        35
 9 129        35
10 132        35
11 133        35
12 137        35

The model and metric used to test the current power (post-hoc) was the ANOVA and F-test, respectively. The F-test looks for differences between all treatments in order to state that their is a difference, somewhere between treatments, given the chosen False Positive Rate.

The results, significance, and F-value are shown in the code below.

Call:
lm(formula = gain ~ Diet, data = ARCdata)Residuals:
     Min       1Q   Median       3Q 
-204.067  -48.865   -4.894   45.291 
     Max 
 213.933Coefficients:
            Estimate Std. Error
(Intercept) 290.7048     6.9934
DietB       -12.6381     9.8901
DietC       -15.9836     9.9139
DietD         0.2087     9.9139
            t value Pr(>|t|)    
(Intercept)  41.569   <2e-16 ***
DietB        -1.278    0.202    
DietC        -1.612    0.108    
DietD         0.021    0.983    
---
Signif. codes:  
  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05
  ‘.’ 0.1 ‘ ’ 1Residual standard error: 71.66 on 414 degrees of freedom
  (2 observations deleted due to missingness)
Multiple R-squared:  0.01037, Adjusted R-squared:  0.003198 
F-statistic: 1.446 on 3 and 414 DF,  p-value: 0.2289[1] 0.2288678
[1] 1.445967

To obtain a post-hoc power analysis, I need to look at the probability of obtaining a significant p-value (p<=0.05). I applied bootstrapping (resampling with replacement) to estimate the population mean and standard deviation (sd). Once I have the population mean and sd, I can then simulate 1000 trials, and look at the achieved ‘power’ in a population.

In the code below, you see the results of the differences between the population mean and standard deviation to the sample mean and sd. This is often called ‘bias’, and gives a good approximation of how useful the sample data is for simulation.

         120          121          122          123          125 
 0.020574286 -0.086813862  0.085062857 -0.243245714  0.185080000 
         126          127          128          129          132 
-0.052434286  0.195677143 -0.005534286 -0.153240000  0.114102857 
         133          137 
 0.044703253  0.031260000
      120       121       122       123       125       126       127 
-1.318715 -1.472052 -1.720122 -1.763247 -1.601380 -1.356375 -1.600663 
      128       129       132       133       137 
-1.624573 -1.583112 -1.250887 -1.307703 -1.889067

Then, I started simulating 10k new trials, using the population mean and standard deviation, looking for the power when using 35 samples per tank.

apply(x, 2, FUN = power)
V1                39.21

As you can see, the results are not that great showing a post-hoc power of 39% based on population values. This means that the effect size found in the current experiment does not allow for sampling less than 35 fish per tank. Not, unless we can increase the effect size of the differences.

To simulate such a scenario, we need to use the population mean, standard deviation, coefficient of variation, and start expanding effect sizes by increasing the mean value of one single treatment. Here, I chose treatment D (and its tanks 121, 126, 129).

print(popcv<-(popsd/popmean))
      120       121       122       123       125 
0.2036404 0.2273521 0.2912939 0.2887722 0.2576816 
      126       127       128       129       132 
0.2100662 0.2421316 0.2734029 0.2557708 0.2290366 
      133       137 
0.2546334 0.2381327

The code above is a lot to digest, but what I basically do is ask R to create an array in which F-values and p-values are stored when changing a matrix of effect sizes and number of fish sampled per tank.

Above is some data wrangling to produce the graph I want, using the variables ‘power’, ’N’, and ‘ES’. Below is the code to produce the graph, which should be self-explanatory if you see the graph.

The graph showing the results of the simulation. Changing variables are the Effect Size (ES) and Number of Fish. The curves are not smooth because only 1000 samples were used, which took about 60 minutes on an i7.

The graph above clearly shows that power is a function of effect size and the number of fish sampled. To reach a power of 80%, you need to have a certain effect size and no amount of fish can counterbalance this. Hence, why when designing an experiment, you need to do whatever you can to create the necessary effect to detect the signal from the noise.

How many samples do you need?

Simulation in R

Written by Dr. Marc Jacobs