Sunday, December 10, 2023

Significant level versus p-value

Sometimes, the significant level and p-value are getting mixed up and confusing to some non-statisticians. It is not surprising to receive a question or request for statistician to design a study to obtain a p-value of 0.05 or 0.01. While the significant level and p-value are closed related, ,they are used in different stage of the trial - significant level is used in the study design stage and p-value is used in the analysis stage.


A significant level is usually set at 0.05 at the study design stage. After the study, data is analyzed and p-value is calculated. The p-value is then compared to the pre-specified significant level to determine if the study results is statistically significant. 

If the significant level is set at 0.01 at the study design stage, which is temped for avoiding doing two pivotal studies, it will set the unnecessary high bar for declaring the successful trial in the analysis stage. 

"The significance level," "alpha" (α), and "Type I error rate" are essentially referring to the same concept in the context of hypothesis testing. These terms are often used interchangeably and are closely related. Here's a brief explanation of each:

Significance Level (Alpha, α): The significance level is a pre-defined threshold (usually denoted as α) set by the researcher before conducting a statistical test. It represents the maximum acceptable probability of making a Type I error. Common choices for alpha include 0.05 (5%), 0.01 (1%), and others. It determines the level of stringency for the test, where a smaller alpha indicates a more stringent test.

Significant level is just one of the parameters in calculating the sample size during the study design stage. Other parameters include the effect size (assumed treatment difference), the standard deviation, statistical power (type 2 error), and alpha adjustment due to multiplicity issue, interim analyses,...

Alpha (α): Alpha is the symbol used to represent the significance level in statistical notation. When you see α, it's referring to the predetermined threshold for statistical significance.

Type I Error Rate: The Type I error rate is the probability of making a Type I error, which occurs when you reject the null hypothesis when it is actually true. The significance level (alpha) directly relates to the Type I error rate because the significance level sets the limit for how often you are willing to accept such an error. The Type I error rate is typically equivalent to the significance level (alpha), assuming the test is properly conducted.

P-value: The p-value is calculated as part of the statistical analysis after the data has been collected. It measures the strength of the evidence against the null hypothesis based on the collected data. A smaller p-value indicates stronger evidence against the null hypothesis, and a larger p-value suggests weaker evidence.

The p-value measures the strength of evidence against a null hypothesis. The p-value is the probability under the assumption of no effect or no difference (null hypothesis) of obtaining a result equal to or more extreme than what was actually observed. The 'p' stands for probability and measures how likely it is what any observed value between 0 and 1. Values close to 0 indicate that the observed difference is unlikely to be due to chance, whereas a p value close to 1 suggests that it is highly likely that the difference observed is due to chance. If the p-value is low, it suggests evidence against the null hypothesis, and then alternative hypothesis (assumption of the effect or difference) will be accepted. 

The p-value indicates how incompatible the data are with a specified statistical model constructed under a set of assumptions, together with a null hypothesis. The smaller the p-value, the greater the statistical incompatibility of the data with the null hypothesis. When we get a p-value that is greater than the pre-specified significant level, we fail to reject the null hypothesis - it means that there is insufficient evidence to reject. 

STAT national biotech reporter Damian Garde explains what p-value is:

Even though hypothesis testing and p-value have been criticized (see a previous post "Retire Statistical Significance and p-value?"), the p-value is still the primary indicator by the sponsor, regulator, medical community, and pretty much everybody to judge if a clinical trial is successful or not. 
,
Regulatory approval of a medicinal product depends on more than just a p-value. The approval depends on the totality of the evidence, the magnitude of the treatment difference, clinical significance or clinical meaningfulness, the confidence interval of the estimate, the safety profile, whether the benefit outweighs the risk.

We have seen the cases that the drug is approved even though the p-value was not statistically significant (i.e., did not reach the pre-specified significant level). See the previous post "Drugs Approved by FDA Despite Failed Trials or Minimal/Insufficient Data". We also see the cases that the drug was not approved even though the p-value was statistically significant. See the article "FDA blocks Alnylam's bid to expand Onpattro label" even though the study results were statistically significant and published in the NEJM "Patisiran Treatment in Patients with Transthyretin Cardiac Amyloidosis".

In the end, we can't retire the p-value. We relied on the p-value to measure how strong the evidence is. However, we should not be the slave of the p-value. 

No comments: