On Biostatistics and Clinical Trials: Analysis Problems with Subgroup Analyses

Sub-grouping damages the balance obtained by randomization

If the randomization is stratified for one factor (for example, disease severity), it will ensure the balance of the treatments inside the subgroups defined by that factor but not necessarily the balance of other prognostic factors (unless the subgroups are very large)
When minimization is used, the balance for other stratification factors (eg., age category) inside the subgroups is not guaranteed.

Treatment comparisons within subgroups lack power

the planned sample size N is large enough for detecting a specified difference in the WHOLE group
Sub-grouping -> smaller sample size for each comparison -> lower power
The statistical power to detect a treatment by subgroup interaction (ie. different treatment effects between subgroups) is usually very low

It is always possible to find subgroups in which the treatment effect is more extreme than the overall effect (data dredging)

It is always possible to find a grouping of the sample such that the treatment effect is more pronounced in one subgroup and less pronounced in the other
Indeed, the overall treatment effect is a sort of average of the subgroup treatment effects
It is always possible to find a subgroup with a significant difference just by chance!

Subgroup anlaysis induce multiple testing problems

Suppose you perform K tests, each of them at the alpha=0.05 significant level, the overall type I error rate (the risk of finding at least one spurious statistically significant result among the K tests) is alpha(overall) = 1-(1-alpha)^k
The Bonferoni adjustment must be used to maintain the overall alpha close to 0.05: use alpha/K for each test

Improper subgroups

Improper sugroups: subgroups of patients classified by an event measured after randomization and potentially affected by treatment - Response, means or survival comparisons to therapy, by compliance, by severity of side effects, or any factor not stratified for
Inherent prognostic features inflence both the endpoint and the event
Lead time bias: those who have the event early necessarily fall in the "poor" classification
No causality relationship can be demonstrated

On Biostatistics and Clinical Trials