Sunday, May 21, 2023

Comparing assumptions for sample size estimation with the interim and final results

Sample size estimation is one of the critical aspects of the clinical trial design. The sample size estimation is usually based on the primary efficacy endpoint. If the primary efficacy endpoint measure is a continuous variable, the sample size estimation will need to be based on assumptions about the effect size (for example, the difference in means) and the common standard deviation If the primary efficacy endpoint measure is a rate and proportion, the sample size estimation will need to be based on the effect size (for example, the difference in responder rate) and the rate/proportion in the control group. 

Sometimes, the sample size estimations can be grossly inaccurate primarily because the assumptions used for the sample size calculation deviate from the observed data. This is especially true in planning pivotal studies with no or insufficient early-phase clinical trial data. 

It is important to check the assumptions for sample size estimation during the study and adjust the sample size when the observed data suggests the inaccuracy of these assumptions. The process is essentially the "Adaptations to the Sample Size" described in FDA's guidance "Adaptive Designs for Clinical Trials"

"Accumulating outcome data can provide a useful basis for trial adaptations. The analysis of outcome data without using treatment assignment is sometimes called pooled analysis. The most widely used category of adaptive design based on pooled outcome data involves sample size adaptations (sometimes called blinded sample size re-estimation). Sample size calculations in clinical trials depend on several factors: the desired significance level, the desired power, the assumed or targeted difference in outcome due to treatment assignment, and additional nuisance parameters—values that are not of primary interest but may affect the statistical comparisons. In trials with binary outcomes such as a response or an undesirable event, the probability of response or event in the control group is commonly considered a nuisance parameter. In trials with continuous outcomes such as symptom scores, the variance of the scores is a nuisance parameter. By using accumulating information about nuisance parameters, sample sizes can be adjusted according to prespecified algorithms to ensure the desired power is maintained. In some cases, these techniques involve statistical modeling to estimate the value of the nuisance parameter, because the parameter itself depends on knowledge of treatment assignment. These adaptations generally do not inflate the Type I error probability. However, there is the potential for limited Type I error probability inflation in trials incorporating hypothesis tests of non-inferiority or equivalence. Sponsors should evaluate the extent of inflation in these scenarios." 

 "One adaptive approach is to prospectively plan modifications to the sample size based on interim estimates of nuisance parameters from analyses that utilize treatment assignment information. For example, there are techniques that estimate the variance of a continuous outcome incorporating estimates of the variances on the individual treatment arms, or that estimate the probability of a binary outcome on the control arm based on only data from that arm. These approaches generally have no effect, or a limited effect, on the Type I error probability. However, unlike adaptations based on non-comparative pooled interim estimates of nuisance parameters, these adaptations involve treatment assignment information and, therefore, require additional steps to maintain trial integrity.
Another adaptive approach is to prospectively plan modifications to the sample size based on comparative interim results (i.e., interim estimates of the treatment effect). This is often called unblinded sample size adaptation or unblinded sample size re-estimation. Sample size determination depends on many factors, such as the event rate in the control arm or the variability of the primary outcome, the Type I error probability, the hypothesized treatment effect size, and the desired power to detect this effect size. In section IV., we described potential adaptations based on non-comparative interim results to address uncertainty at the design stage in the variability of the outcome or the event rate on the control arm. In contrast, designs with sample size adaptations based on comparative interim results might be used when there is considerable uncertainty about the true treatment effect size. Similar to a group sequential trial, a design with sample size adaptations based on comparative interim results can provide adequate power under a range of plausible effect sizes, and therefore, can help ensure that a trial maintains adequate power if the true magnitude of treatment effect is less than what was hypothesized, but still clinically meaningful. Furthermore, the addition of prespecified rules for modifying the sample size can provide efficiency advantages with respect to certain operating characteristics in some settings."

One thing that is often neglected is to compare the final results with the assumptions. When a clinical trial is concluded, it is always good to check how different the final results are from the assumptions. If the final results are positive (indicating the success of the trials), people tend to ignore the assumptions made during the trial planning stage. Only if the final results are negative (indicating the failure of the trials), do people tend to go back to the assumptions and claim that the trial failed due to inaccurate assumptions leading to the lack of statistical power. 

Biogen's Tofersen for SOD1-ALS

Biogen designed a Valor study as the pivotal study to investigate the effect of tofersen for the treatment of patients with Amyotrophic Lateral Sclerosis (ALS) associated with mutations in the superoxide dismutase 1 (SOD1) gene (SOD1-ALS) - a subset of general ALS population. The primary efficacy endpoint is  the ALSFRS-R score and the sample size for the study was based on assumptions about the ALSFRS-R score. 

"We calculated that a sample size of 60 participants (2:1 randomization ratio) in the faster-progression primary analysis subgroup would provide 84% power to detect a between-group difference on the basis of the joint rank test (described below), assuming a change in the ALSFRS-R score from baseline to week 28 of −4.8 in the tofersen group and −24.7 in the placebo group, with a standard deviation of 20.39 and survival of 90% in the tofersen group and 82% in the placebo group, at a two-sided alpha level of 0.05."

The final results indicated that assumptions were so inaccurate. In the placebo group, the change from baseline to week 28 is -8.14 (versus assumed -24.7).

Usually, it is the sponsor's responsibility to ensure that the assumptions for sample size calculation are as accurate as possible. If inaccurate assumptions are used in sample size calculation that leads to the failure of the trial, the regulatory agency may request the sponsor to do additional trials (with more accurate assumptions). However, in Biogen's Tofersen Vilor trial, FDA came to the defense of Biogen why the trial failed in the primary efficacy endpoint in ALSFRS-R score so that they could potentially approve a drug based on the positive results in biomarker and discredit the fact that the study failed in clinical endpoint. In FDA's briefing book for the advisory committee to discuss the Tofersen in SOD1-ALS, the following was mentioned:

Comparing the assumptions for sample size estimation with the analysis results can be complicated by the fact that different statistical methods are used. Sample size estimation may be based on a two-sample t-test while the actual data will be analyzed using more complicated methods (analysis of covariance, mixed model repeated measures, random coefficient model, non-parametric methods,...). For studies with a time-to-event primary efficacy endpoint, the sample size calculation may be based on the log-rank test, and the statistical analyses may be based on the Cox regression where analyses are adjusted for multiple explanatory variables. 

However, it is always good to compare the assumptions for the sample size estimation with the observed data (during the study or at the conclusion of the study). 

1 comment: