The world’s leading publication for data science, AI, and ML professionals.

Why you never really validate your analytical method unless you use the total error approach (part…

By Thomas de Marchin (Senior Manager Statistics and Data Sciences at Pharmalex), Milana Filatenkova (Manager Statistics and Data Sciences…

Part II: Producer and Consumer risks

By Thomas de Marchin (Senior Manager Statistics and Data Sciences at Pharmalex), Milana Filatenkova (Manager Statistics and Data Sciences at Pharmalex) and Eric Rozet (Director Statistics and Data Sciences at Pharmalex)

This is a second article in series of articles on the total error approach within the context of Analytical Method validation. If you missed the first article, I encourage you to read it to get familiar with the total error concept and what we are going to discuss below: https://towardsdatascience.com/why-you-never-really-validate-your-analytical-method-unless-you-use-the-total-error-approach-part-cb2247874cd

The status quo is – Validation results are often over trusted. A method having passed validation assessment is usually considered sufficient to claim its reliable performance. However, it is important to remember that in statistics, nothing can be guaranteed with 100% confidence. A probability tag must be placed on any statement in order to crown it with a certain degree of belief. 100% confidence may be attributed only to a statement made upon having studied an entire population of possible samples – an ideal situation which is not plausible in reality. Within the context of analytical method validation, testing a method on the entire population would mean making infinite number of measurements. In reality, method performance is estimated based on a few measurements of a sample, one of infinite number of possible samples (i.e. population) and hence is uncertain.

Let us explain this concept with Figure 1. Suppose we have a method with a perfectly known trueness measured by a relative bias of +1.5% and a perfectly known precision CV of 2.5%. You will never know what the true values are because it would require making infinite number of measurements to establish true values. Instead, you will try to estimate the performance of your analytical method by making only a few measurements (let’s say 3 series of 3 replicates per concentration level, a limited design often used in the industry). As illustrated in Figure 1, every time you do this exercise, you get different estimates of trueness and precision. In practice, this exercise is performed only once in hope of estimates being close to the true value.

Now let’s suppose your validation criteria are +-2% relative bias for trueness and 3% CV for precision (values commonly encountered in the Pharmaceutical industry). Method whose real trueness is +1.5% and CV 2.5% as in example above should be valid since the true values for these parameters are included into their respective acceptance criteria. However, since we estimate these parameters on a sample, we might accidentally get estimates above the acceptance criteria, such as in the second example in Figure 1. In this case, you would declare your method not valid while it is actually valid. On the other hand, the opposite situation may take place. You may have a method with a real bias and/or precision above the validation criteria and accidentally obtain estimates within the acceptance limits. In that situation, you would declare the method valid while it is actually not. The first situation is a producer risk (risk of rejecting a valid method) while the second one is a consumer risk (risk of accepting an invalid method).

Bouabidi et al. (2010) investigated the risks associated with the use of the classical Descriptive approach (i.e. assessing precision and trueness separately) and the total error approach (i.e. assessing the simultaneous combination of the precision and trueness). Figure 2 illustrates these risks with simulated Validation experiments for in-silico methods with different combinations of known precision CV and known bias. Known biases ranged from −5 to +5% and known intermediate precision CV (RSD) ranged from 0% to 5%. The number of replicates was fixed at J = 3 and the number of series i used were 3 and 10. The acceptance criteria were set at ±2% for the relative bias and 3% for the CV for the descriptive approach and ±5% for the total error with a 5% risk. These values are typical limits used within the framework of the evaluation of the conformity of active ingredients in pharmaceutical formulations.

The process of simulation is illustrated in Figure 1. Each iteration consists in picking samples from a distribution with given precision and trueness. Then, the precision and trueness (for the descriptive approach) or total error (for the total error approach) are estimated based on these samples and compared to the predefined criteria to assess the validity of the method. This process is repeated thousands of times where method validity is established with regards to estimated trueness and bias ("descriptive" approach) or total error ("total error approach") The probability to accept the simulated analytical method with given precision and bias as valid is calculated as the ratio of experiments in which this method succeeded validation to the total number of experiments performed.

The curves plotted in Figure 2 are Iso-probability curves along which method precision and trueness are varied but the probability to succeed validation remains the same . For example, in Figure 2a, the iso-probability curve labeled 75% show all the possible combinations of precision and trueness that lead to 75% chance for the associated method to be declared valid using the descriptive approach.

To assess these iso-probability curves we would need a reference of comparison– a 100%-iso-probability curve (shown in red in the plot). For each combination of bias and precision along this curve a perfect validation process is performed, which consists in at least 95% of infinite number of sample measurements falling within the total error limits of 5%. Within this ideal set-up, instead of estimating bias and precision from a limited sample size, true values are taken to generate the measurements.

Let’s now see what we can learn from these figures.

Within the framework of descriptive approach, any method having the combination of their known bias and precision outside of the rectangle defined by the acceptance limits should be declared non valid. However, with a commonly used experimental design of 3 series of 3 replicates (Figure 2a), due to uncertainty in estimates of bias and precision, this approach would also accept as valid with a significant probability of 35–55% methods with trueness or precision values somewhat greater than their respective acceptance limits. This validation design would lead to up to one validation out of two being wrongly classified as valid. This example demonstrates that descriptive approach may be associated with a considerable consumer risk. On the other hand, any method whose validation parameters fall inside the acceptance rectangle should be declared valid. Yet, on Figure 2 we see that there is significant chance (25–45%) of rejecting methods with known trueness and precision values within their respective acceptance limits. This example shows that a substantial number of perfectly acceptable methods may end up being discarded using this approach – scenario of significant producer risk.

Given the same sample size, total error approach provides smaller consumer risk but higher producer risk as compared to the descriptive approach. Under total error approach, the consumer risk is much more controlled since the risk of wrongly declaring methods valid is less than 35% for a small sample size (Figure 2c), while this risk is as high as 55% for the descriptive approach. On the other hand, the producer risk is higher with the total error approach as we see that this approach rejects valid methods with a probability reaching more than 65%. In comparison the maximum producer risk was 45% for the descriptive approach. For the total error approach, when increasing the sample size (Figure 2d), both consumer and producer risks are decreased as the parameters of the methods are better estimated.

By increasing the sample size, it is possible to reduce uncertainty of trueness and precision estimates, which would lead to the iso-probability curves converging to 100% ideal iso-probability curve, where producer and consumer risks are minimal. Indeed, with a small sample size, you might accidentally pick a few extreme samples and get estimations far from the true values. By increasing sample size, the risk of picking extreme samples can be reduced

It is worth noting, that increasing sample size, thereby reducing consumer/producer risk, has limited consequences for descriptive approach in terms of fulfilling the objective of validation, which is to guarantee that future measurements will be close to the true value. Indeed, while iso-probability curves generated under total approach converge towards ideal validation curve (the iso-probability curves in Figure 2c and Figure 2d have a shape very close to the red continuous hyperbolic curve) with the number of measurements going to infinity, those generated under descriptive approach converge towards the rectangle defined by the acceptance limit. At the values corresponding to the acceptance limits, for the descriptive approach with infinite sample size, the risk of wrongly accepting or rejecting as valid an analytical method is 50%. In other words, the acceptance region of the descriptive approach is not compatible with the objective of the validation, which is to guarantee that at least 95% of the results generated by the methods declared valid are within the ±5% acceptance limits for total error, no matter the sample size used for making measurements.

Before starting the validation phase, it is possible – and recommended – to determine the optimal number of experiments to perform (number of series and number of replicates by series) to ensure making correct decisions. Those optimal numbers will be based on guesses about the precision and trueness of the method. These guesses can be obtained by analyzing the results of the prevalidation, development or optimization phases of the analytical method. As we saw above, performing too few experiments could lead to rejection of an acceptable analytical method. Conversely, too much experiments, leading to an excessive power, will make the validation phase longer and more costly than necessary. It is important to find a balance between these two extremes – i.e. an optimal number of experiments to be performed. Table 1 shows an example of the minimal recommended sample size required to achieve validation in 95% of the cases using the Total Error approach, with less than 5% risk that future measurements will fall outside the acceptance limits (set here at 10%). In this example, the recommended number of Series I and replicates J by series are presented as a function of the expected values for the Between-Series σb and the Within-Series σw (or repeatability), both expressed in %CV. If the analytical method is expected to have a known σw of 1% and a known σb of 1%, then the minimum number of series and the minimum number of replicates per series to achieve success in 95% of the case are 3 and 4, respectively. When it is expected to have a known σw of 3% and a known σb of 0.5%, then the minimum number of series is 6 and the minimum number of replicates per series is 5.

In conclusion, this article has demonstrated that there are two risks associated with the validation: the risk of declaring a non-valid method valid (the consumer risk) and the risk of declaring a valid method non-valid (the producer risk). We have seen that with a sample size of 3 series and 3 replicates per series, a design commonly used in the industry, these risks can be very high. It is recommended to determine the optimal number of experiments to perform before proceeding with the validation exercise in order to control the producer risk. When comparing the descriptive and total error approaches, it has been shown that the consumer risk is clearly lower for the latter compared to the former, whatever the sample size. For the total error approach, there is control of the probability to obtain results within the acceptance limits i.e. knowledge in the reliability of your analytical method results. A crucial information that is not available with the descriptive approach. The producer risk of the total error approach is reduced when increasing the sample size. Finally, among the two approaches considered, only total error approach has potential to fulfill the objective of validation: can you trust your results to make adequate critical decisions all along the product life cycle.

Bibliography

Bouabidi, A., E. Rozet, M. Fillet, E. Ziemons, E. Chapuzet, B. Mertens, R. Klinkenberg, A. Ceccato, M. Talbi, B. Streel, A. Bouklouze, B. Boulanger, and Ph. Hubert. 2010. "Critical Analysis of Several Analytical Method Validation Strategies in the Framework of the Fit for Purpose Concept." Journal of Chromatography A 1217(19):3180–92. doi: 10.1016/j.chroma.2009.08.051.


Related Articles