Friday, June 05, 2009

Group t-test or Chi-square test based on the summary data

Sometimes, the only data we have is the summary data (mean, standard deviation, # of subjects). Can we use the summary data (instead of the raw data) to calculate the statistical and p-values?

Yes, we can.

Below is an example for group t-test. I illustrate two methods for calculating the p-values based on the summary data.

In the method 1, we will use the SAS procedure PROC TTEST. The only trick thing is to enter the summary data in a data set with an SAS internal variable _STAT_ for the indicator of the summary statistics. The program below is self-explanatory.

data summary;
length _stat_ $4;
input week $ _STAT_ $ value@@;
datalines;
w1 n 7
w1 mean -2.6
w1 std 1.13
w2 n 5
w2 mean -1.2
w6 std 0.45
;
proc print;run;
proc ttest data=summary;
class week;
var value;
run;



Another way is to use the formula.


The correct formula for calculating the t value for group t-test is shown on the right side Where m=0 with degree freedom of n1+n2-2. To compare means from two independent samples with n1 and n2 observations to a value m, this formula can also be used.

where s**2 is the pooled variance

s**2 = [((n1-1)s1**2+(n2-1)s2**2)/(n1+n2-2)]

and s1**2 and s2**2 are the sample variances of the two groups. The use of this t statistic depends on the assumption that sigma1**2=sigma2**2, where sigma1**2 and sigma2**2 are the population variances of the two groups.

*Method #2;
data ttest;
input n1 mean1 sd1 n2 mean2 sd2;
s2 = (((n1-1)*sd1**2+(n2-1)*sd2**2)/(n1+n2-2));
s =sqrt(s2);
denominator = s * sqrt((1/n1) + (1/n2));
df = n1+n2-2;
t = (mean1 - mean2)/denominator;
p = (1-probt(abs(t),df))*2;
datalines;
7 -2.6 1.13
5 -1.2 0.45
;
run;
proc print;
run;

It will be even easier if the summary data is # of counts or frequency data. we can use SAS PROC FREQ option WEIGHT to indicate that data is for # of counts instead of the original individual data. The SAS codes will be something like:

data disease;
do exposure=1 to 2;
do disease=1 to 2;
input index@;
output;
end;
end;
cards;
23 32
17 15
;
proc freq data=disease;
tables exposure*disease/chisq;
weight index;
run;

No comments: