Sometimes, the only data we have is the summary data (mean, standard deviation, # of subjects). Can we use the summary data (instead of the raw data) to calculate the statistical and p-values?
Yes, we can.
Below is an example for group t-test. I illustrate two methods for calculating the p-values based on the summary data.
In the method 1, we will use the SAS procedure PROC TTEST. The only trick thing is to enter the summary data in a data set with an SAS internal variable _STAT_ for the indicator of the summary statistics. The program below is self-explanatory.
length _stat_ $4;
input week $ _STAT_ $ value@@;
w1 n 7
w1 mean -2.6
w1 std 1.13
w2 n 5
w2 mean -1.2
w6 std 0.45
proc ttest data=summary;
Another way is to use the formula.
The correct formula for calculating the t value for group t-test is shown on the right side Where m=0 with degree freedom of n1+n2-2. To compare means from two independent samples with n1 and n2 observations to a value m, this formula can also be used.
where s**2 is the pooled variance
s**2 = [((n1-1)s1**2+(n2-1)s2**2)/(n1+n2-2)]
and s1**2 and s2**2 are the sample variances of the two groups. The use of this t statistic depends on the assumption that sigma1**2=sigma2**2, where sigma1**2 and sigma2**2 are the population variances of the two groups.
input n1 mean1 sd1 n2 mean2 sd2;
s2 = (((n1-1)*sd1**2+(n2-1)*sd2**2)/(n1+n2-2));
denominator = s * sqrt((1/n1) + (1/n2));
df = n1+n2-2;
t = (mean1 - mean2)/denominator;
p = (1-probt(abs(t),df))*2;
7 -2.6 1.13
5 -1.2 0.45
It will be even easier if the summary data is # of counts or frequency data. we can use SAS PROC FREQ option WEIGHT to indicate that data is for # of counts instead of the original individual data. The SAS codes will be something like:
do exposure=1 to 2;
do disease=1 to 2;
proc freq data=disease;