## Wednesday, November 12, 2008

### Analysis Problems with Subgroup Analyses

Sub-grouping damages the balance obtained by randomization

• If the randomization is stratified for one factor (for example, disease severity), it will ensure the balance of the treatments inside the subgroups defined by that factor but not necessarily the balance of other prognostic factors (unless the subgroups are very large)
• When minimization is used, the balance for other stratification factors (eg., age category) inside the subgroups is not guaranteed.

Treatment comparisons within subgroups lack power

• the planned sample size N is large enough for detecting a specified difference in the WHOLE group
• Sub-grouping -> smaller sample size for each comparison -> lower power
• The statistical power to detect a treatment by subgroup interaction (ie. different treatment effects between subgroups) is usually very low

It is always possible to find subgroups in which the treatment effect is more extreme than the overall effect (data dredging)

• It is always possible to find a grouping of the sample such that the treatment effect is more pronounced in one subgroup and less pronounced in the other
• Indeed, the overall treatment effect is a sort of average of the subgroup treatment effects
• It is always possible to find a subgroup with a significant difference just by chance!

Subgroup anlaysis induce multiple testing problems

• Suppose you perform K tests, each of them at the alpha=0.05 significant level, the overall type I error rate (the risk of finding at least one spurious statistically significant result among the K tests) is alpha(overall) = 1-(1-alpha)^k
• The Bonferoni adjustment must be used to maintain the overall alpha close to 0.05: use alpha/K for each test

Improper subgroups

• Improper sugroups: subgroups of patients classified by an event measured after randomization and potentially affected by treatment - Response, means or survival comparisons to therapy, by compliance, by severity of side effects, or any factor not stratified for
• Inherent prognostic features inflence both the endpoint and the event
• Lead time bias: those who have the event early necessarily fall in the "poor" classification
• No causality relationship can be demonstrated