## Friday, August 29, 2014

### Subgroup Analysis in Clinical Trials - Revisited

I had previously written an article about the sub-group analysis in clinical trials. I would like to revisit this topic. The subgroup analysis has been one of the regular discussion topics in statistical conferences recently. The pitfalls of the subgroup analyses are well-understood in statistical communities. However, the subgroup analyses in regulatory setting for product approval, in multi-regional clinical trials, in confirmatory trials are quite complicated.

EMA is again ahead of FDA in issuing its regulatory guidelines on this topic. Following an expert workshop on subgroup analysis, EMA issued its draft guideline titled “Guideline on the investigation of subgroups in confirmatory clinical trials”. In addition to the general considerations, they provided the guidelines on issues to be addressed during the study planning stage and the issues to be addressed during the assessment stage.

In practice, the sub-group analysis is almost always conducted. For a study with negative results, the purpose of the sub-group analysis is usually to see if there is a sub-group where the statistical significant results can be found. For a study with positive results, the purpose of the sub-group analysis is usually to see if the result is robust across different sub-groups. The sub-group analysis is not just performed in industry sponsor trials, it may even more often performed in academic clinical studies for publication purpose.

Sometimes it is not so easy to explain the caveats of the sub-group analysis (especially the unplanned sub-group analysis) to non-statisticians. The explanation of the sub-group analysis issues needs the good understanding of the multiplicity adjustments and the statistical power. I recently saw some presentation slides on sub-group analysis issues and pitfalls of the sub-group analysis were well explained in the table below. Either way can make the sug-group analysis results unreliable.

 Dr George (2004) “Subgroup analyses in clinical trials” When H0 is true Increased probability of type I error Too many “differences” Because the probability of each “statistically significant difference” not being real is 5% So lots of 5% all add together Some of the apparent effects (somewhere) will not be real We have no way of knowing which ones are and which ones aren’t When H1 is true Decreased power (increased type II error) in individual subgroup Not enough “differences” The more data we have, the higher the probability of detecting a real effect (“power”) But sub-group analyses “cut the data” Trials are expensive and we usually fix the size of the trial to give high “power” to detect important differences overall (primary efficacy endpoint) When we start splitting the data (only look at men, or only look at women, or only look at renally impaired; or only look at the elderly; etc., etc.), the sample size is smaller … the power is much reduced

In clinical trials for licensure, the regulatory agencies such as FDA may require the sub-group analyses (planned or unplanned) to see if the results are consistent across different sub-groups or if there are different risk-benefit profiles across different sub-groups. The reviewers may also perform their own sub-group analyses. However, they are aware of the pitfalls of these sub-group analyses. The recently approved Zontivity by FDA is a great example for this exact issue. See Pink Sheet article "FDA Changed Course On Zontivity Because Of Skepticism Of Subgroups At High Levels". Initially, FDA reviewers performed sub-group analyses and identified that the subjects with weight less than 60 kg had different risk-benefit profile comparing to subjects with weight greater than 60 kg. An advisory committee meeting was organized to discuss the issue if the approved indication should be limited to the specific sub-group. However, eventually FDA changed the course and did not impose the label restriction for specific sub-group. They commented that “The point is that one has to be careful not to over-interpret these subgroup findings.”