Monday, December 27, 2010

Bootstrap and SAS

In statistics, bootstrapping is a resampling technique used to obtain estimates of summary statistics. In clinical trials, bootstrapping technique could be a useful approach in obtaining the precision of an estimator. Most common application of the bootstrapping technique may be in obtaining the confidence interval for an estimator while the typical way of obtaining the confidence interval through the standard error approach is impossible or difficult.

Here are two examples that the bootstrapping technique needs to be implemented. The first example is for a manuscript. When we submitted our paper to European Respiratory Journal, one of the reviewer comments was a request for evaluating the internal consistency. The comment says “The statistical method is sample-based as it consists in a regression performed on this sample. Such a method needs at least evaluation for internal consistency (by measuring the regression correlation on a subsample then validating on another subsample or better by using bootstrap and jackknife methods).”

The second example is a request from the regulatory agency for calculating the 95% CI for % relative dif
ference. When there are two treatment means: A and B; % relative difference is defined as %RD= (A-B)/A. There may be other approaches in this case, but bootstrapping technique could come handy in calculating the 95% CI for %RD.

Bootstrap can be easily implemented in SAS and it contains three main steps: 1) resample the data from the observed data set (observed data is only one sample) – SAS Proc Surveyselect can serve this purpose 2) obtain the statistics (or estimator) by performing the analysis for each sample / resample 3) perform the summary statistics from the collection of the statistics or estimator.




Bootstrap is a suggested statistical approach for obtaining the confidence interval for individual and population bioequivalence criteria.

Some good references about how to do bootstrapping using SAS are included here:

Ten years ago, I had to use a SAS macro to do the bootstrap for my PhD dissertation. The macro is still there on SAS website.

Bootstrap technique has also been built into several SAS procedures (such as Proc Multtest, Proc MI).

When bootstrap is used in regression situation, 'Bootstrap Pairs' technique may be employed. Freedman (1981) proposed to resample directly from the original data: that is, to resample the couple dependent variable and regressor, this is called bootstrapping pairs.  Bootstrap pairs is described in a paper by Flachaire. The SAS macro for bootstrapping discussed two main ways to do bootstrap resampling for regression models, depending on whether the predictor variables are random or fixed.If the predictors are random, you resample observations just as you would for any simple random sample. This method is usually called "bootstrapping pairs". If the predictors are fixed, the resampling process should keep the same values of the predictors in every resample and change only the values of the response variable by resampling the residuals. 

Sunday, December 12, 2010

Counting the study day


For every clinical trial, we need to count the study day for calculating the follow-up visits and for assessing the temporal relationship between events. The study day starts with the day that the subject is randomized and receives the first dose of the study medication. Usually, the randomization date and the first dose of the study medication date are the same. In clinical study protocol, there should always be a ‘schedule of events’ or ‘schedule of evaluations’ table which defines the study procedures and the study visits. This table should include the study day.

There is one critical difference in counting the study days. The protocol could count the day of subject receiving the first dose of the study medication as “day 0” or “day 1”.

If the first dose date is counted as day 0, the day immediately after the first dose date will be counted as day 1 and the date immediately before will be counted as day -1. Therefore, the study day is counted continuously as … day -7, day -6, day -5, day -4, day -3, day -2, day -1, day 0, day 1, day 2,… In this case, for programming, the study day variable can be created using the formula:

          The event/visit date – first dose date  

The problem with this counting is in 'day 0'. People are used to calling the first day of the study medication as the 'day 1'.

If the first dose date is counted as day 1, the day immediately after the first dose date will be counted as day 2 and the date immediately before will be counted as day 0 – which is confusing. In practice, if the first dose date is counted as day 1, the day 0 will not be used in the study day counting. The date immediately before will be counted as day -1 (skipped day 0). Therefore, the study day is counted as: day -7, day -6, day -5, day -4, day -3, day -2, day -1, day 1, day 2,… For programming, the study day variable would be created using two separate formulas for predose and postdose visits.
For pre-dose:
           the event/visit date – first dose date
For post-dose:
           the event/visit date – first dose date + 1

Both of these approaches (counting including study day 0 or not including study day 0) are not wrong, but sometimes confusions can arise when we calculate the study day variable. Even for CDISC, there are disagreements in handling this between Submission Data Set Tabulation Model (SDTM) (not allowing study day 0) and Analysis Data Set  Model (ADaM) (allowing study day 0).

The following clinical trial protocol templates indicate that the study day counting starts with day 0:
The following clinical trials indicate that the study day counting starts with day 1. There are more industry trials like this.

The unit used in counting the study day depends on the length of the clinical trials. For a trial with months and years in duration, instead of counting by day, it is more practical to count by week, month, or year. For example, for a clinical trial with three years treatment duration, the last treatment date would be three years away. If we count by day, it will be something like day 1095. Even worse, some people may apply the time window to this date to have the last treatment date 1095 +/- 7 days. Sound stupid, isn’t it?

Counting the study day correctly is important for study investigators/coordinators to avoid the protocol deviation. The Barnettinternational actually developed a tool to facilitate the study day /visit scheduling.