Sometimes, it is necessary for us to calculate the p-values based on the summary (aggregate) data without the individual subject level data. In a previous post,
group t-test or Chi-square test based on the summary data was discussed. Group t-test and chi-square test can be used in the setting of parallel-group comparisons.
In single-arm clinical trials, there is no concurrent control group, and the statistical test is usually based on the pre-post comparison. For continuous measures, the pre-post comparison can be tested using
paired t-test based on the change from baseline values (i.e., post-baseline measures - baseline measures): For discrete outcomes, the pre-post comparison may be tested using
McNemar's test.
Paired t-test:
A paired t-test is used when we are interested in the difference between two variables for the same subject. Suppose we have the descriptive statistics for change from baseline values: 83 subjects had the outcome measures at both baseline and week 12 (therefore, 83 pairs), the mean and standard deviation for these 83 pairs are: 10.7 (70.7); 68 subjects had the outcome measures at both baseline and week 24 (therefore 68 pairs), the mean and standard deviation for these 68 pairs are 20.2 (80.9).
With the mean difference, the standard deviation for differences, and the sample size (# of pairs), we have all the elements to calculate the t statistics and therefore the p-value using the formula below.
This can be implemented in SAS as the following - t-statistics and p-values can be calculated for each of weeks 12 and 24 based on the aggregate data.
McNemar's Test:
McNemar's test is a statistical test used on paired nominal data. It is applied to 2 × 2 contingency tables with a dichotomous trait, with matched pairs of subjects, to determine whether the row and column marginal frequencies are equal (that is, whether there is "marginal homogeneity"). In clinical trials, the aggregate data may not be obvious as a 2 × 2 contingency table but can be converted into a 2 × 2 contingency table.
Suppose we have the following summary data for post-baseline week 12: the number and percentage of subjects with improvement, stable (no change), and deterioration categories.
|
|
All subjects (n=300)
|
Week 12
|
Improved
|
54 (18%)
|
No Change
|
228 (76%)
|
Deteriorated
|
18 ( 6%)
|
At Week 12, there are more subjects in the 'Improved' category than in the 'Deteriorated' category even though the majority of subjects are in the 'No Change' category. Are they more subjects with improvement than deterioration?
Assuming that change from category 1 to 0 is 'Improved' and change from category 0 to 1 is 'Deteriorated', the table above can be converted into a 2 × 2 table:
|
|
Baseline
|
0
|
1
|
Week 12
|
0
|
228
|
54
|
1
|
18
|
0
|
or
|
|
Baseline
|
0
|
1
|
Week 12
|
0
|
0
|
54
|
1
|
18
|
228
|
For McNemar’s test, only the numbers in the diagonal discordant
cells (in our case, the # of improved and the # of deteriorated) are relevant.
The concordant cells (in our case, the # of no change)
will only contribute to the sample size (therefore the degree of freedom), not have an impact on the p-value. How the # of subjects with the ‘No Change’ is split doesn’t
matter with our calculation of chi-square statistics and therefore the p-value.
For the data highlighted in yellow, McNemar’s test can
be performed using the SAS codes like this (weight statement indicates count variable is the frequency of the observation and agree option requests McNemar's test). How the 228 subjects in the concordance ‘No Change’ category are split has no impact on the p-value calculation.
Conclusions?
ReplyDelete