Saturday, January 23, 2010

Significant level of 0.00125

Recently, I found some materials related to a lawsuit against the Nuvelo Pharmaceuticals – a company who used to develop a clot-busting product in the indication of occluded central venous catheter. It is very interesting to see quite some arguments about the study design and the significance of a p-value. Since Nuvelo planned to conduct only one pivotal trial (instead of two), they had agreed with FDA to use an extraordinarily high threshold (0.00125). This stringent p-value required for a single pivotal trial, less than 0.00125, was not met and the development program was eventually terminated. The issue is that such a stringent p-value was not communicated to the investment community upfront – one of the reasons for the lawsuit.

There are scientific basis for this extraordinarily high significant level (0.00125). Here are the contexts extracted from one of the FDA’s NDA Statistical Review for United Therapeutics Corporation Drug: UniprostTM (treprostinol sodium) for pulmonary arterial hypertension.

“...A more important issue is the overall Type I error rate for the proposed analysis in this submission. First, consider the traditional standard for approval at the FDA based on two confirmatory trials. Even if the efficacy of a treatment is shown convincingly in one study, the agency likes to see replication in a second study because we will then be in a better position to infer that the results generalize to the entire population of patients with the disease. The overall Type I error rate (or false positive rate) is the chance that both studies will have a p-value less than 0.05 and the results of both studies are in the same direction. If the treatment effects in the two studies are identically 0, then the chance that both p-values will be less than 0.05 and both treatment effects are in the same direction is 0.001251. For this reason, the Division of Cardio-Renal Drugs has often advised sponsors that one study with a p-value less than 0.00125 may be sufficient for approval…”

While not very common, we do see quite some drug development programs with one-single pivotal trial. One such example is the trial called PROTECT where the sample size and power for primary composite endpoint was based on “90% power at two-sided 0.00125 significance level todetect a difference between a distribution of 33% failure, 35% unchanged and 32% success (placebo group) and 25 failure, 34% unchanged, and 42% success (rolofylline group), using the van Elteren extension of the Wilcoxon test”

Designing one single pivotal trial with a significant level of 0.00125 may not be a good strategy in comparing to the conventional two pivotal trials with a significant level of 0.05. A significant level of 0.00125 is forty times more stringent than a significant level of 0.05. Employing such a small significant level will typically require a large sample size and may be difficult to be successful.

FDA’s perspectives for clinical development of tropical microbicides indicated the followings for a single trial:

  • No single site provides unusually large fraction of participants
  • No single investigator or site provides a disproportionate favorable effect
  • Consistency across study subset
  • Statistically persuasive
    Single Multi-Center Trial Level of Evidence (p value, 2-sided)
    · P < 0.001 : persuasive, robust 2*[0.025^2]=0.00125
    · 0.05 > P > 0.01: inadequate
    · 0.01> p > 0.001: acceptable, if:

    - good internal consistency
    - low drop-out rates
    - Other supportive data

In the end, the evidence of efficacy should not purely rely on the p-values. There are other considerations in assessing the evidence of efficacy. This has been spelled out in FDA’s guidance for Industry: Providing Clinical Evidence of Effectiveness for Human Drug and Biological Products:
Tthe evidence of effectiveness could come from one single study with the following:

  • Large multicenter study
  • Consistency across study subsets
  • Multiple studies in a single study
  • Multiple endpoints involving different events
  • Statistically very persuasive finding

Here "statistically very persuasive finding" means a very small p-value even though the guidance does not specifically specify how small the p-value should be. It may depend on the negotiation with the corresponding branches in FDA.

Additional reading:

Thursday, January 14, 2010

Logistic regression: complete or quasi-complete separation of data points

When we perform the logistic regression, sometimes, we may run into an issue so called ‘complete or quasi-complete separation of data points’. In this situation, the maximum likelihood estimate does not exist. If we use SAS Proc Logistic, SAS log will give a warning message "WARNING: There is possibly a quasi-complete separation of data points. The maximum likelihood estimate may not exist. WARNING: The LOGISTIC procedure continues in spite of the above warning. Results shown are based on the last maximum likelihood iteration. Validity of the model fit is questionable." SAS will continue to report the Wald test results and odds ratios, however, these tests are no longer valid and results are not reliable (actually not accurate at all).

Complete separation data is something like below:
0 1
0 2
0 4
1 5
1 6
1 9

There is complete separation because all of the cases in which Y is 0 have X values equal to or less than 4, and the cases in which Y is 1 have X values equal to or greater than 5. In other words, Maximal value in one group is less than the minimal value in another group. When maximal value in one group is equal to the minimal value in another group, quasi-complete separation data may occur.

If the explanatory variable is categorical, complete separation of data points could be something like this:
Response Failure Success
0 25 0

1 0 21

Where There are no successes when the value of the predictor variable is 0, and there are no failures when the value of the predictor variable is 1.

For maximum likelihood estimates to exist, there must be some overlaps in the two distributions. Since logistic regression models uses maximum likelihood estimates, when there is no overlaps of data points between two groups, the results from logistic regression models are unreliable and should not be credited.

Starting from SAS version 9.2, Proc Logistic provides Firth estimation for dealing with the issue of quasi or complete separation of data points.

proc logistic;
model y = x /firth;

However, even after Firth estimation, the results should still be interpreted with extreme caution. Complete separation and quasi-complete separation of the data points may occur when the sample size is small and number of data points is not large or in the situation the samples are determined by the outcome (i.e., response) rather than explanatory variables – we see many publications where the analysis is based on the responders vs. non-responders.

When complete separation or quasi-complete separation occurs, for multivariate regression, the explanatory variable causing this situation should be identified and preferably excluded from the model. For univariate regression, other alternative statistical tests (for example group t-test) should be used.

Further reading:

Sunday, January 03, 2010

Rasch Analysis

I recently noticed a new approach so called 'Rasch Analysis' when I worked on a paper in dealing with the MCID (minimal clinically important difference). I have not got chance to do any Rasch analysis on my own, but I have collected some information here for the future use.

Rasch analysis start to be used in education, survey area. In clinical trial, it is mostly used in psycometric, neurology areas where the outcome assessment relies on the instrument which typically contains certain number of items. These instruments are frequently used in CNS and neurology disease such as stroke, alzheimer, dementia. Traditionally, a scale or instrument will need to be validated through the reliability and validity tests. Recently, in addition to the reliability and validity tests, the Rasch measurement model has set new quality standards for outcome measures by appraising a broad range of measurement properties. You will not be surprised to see many papers if you user the search keyword "Rasch analysis" or "Rasch model" in

There is no existing procedure within SAS to perform the Rasch analysis. However, there are some SAS macros for Rasch analysis on the internet developed by Karl Bang Christensen. The most popular software for Rasch analysis is Winsteps which provide a free download of a Ministep with capability of performing Rasch analysis for less items and less records.

Saturday, January 02, 2010

Winning the holiday gift

During the holiday season, it is very typical for a corporate to hold a party for its employees. During the party, one activity is to win the prizes with the raffle tickets.

Say each employee is distributed with 20 tickets in a raffle with 80 prizes. Which gives you a better chance of winning: putting all of your tickets in one of 80 baskets (your favorite item) or spreading them among 20 baskets with each ticket in one basket?

This seems to be a probability issue. There is an answer from AskMarilyn for the similar issue:

If you can see the baskets and tickets, you should wait until the last minute and then put all of your tickets in the basket that appears to contain the fewest tickets. If you can't see the tickets, put all your tickets in the basket for the least-desirable prize. But if you can't see the tickets and the prizes are equal, it doesn't matter what you do.

In previous year, I won nothing because I put all my raffles in a couple of hot items (there are thousands tickets in the boxes for these hot items). This year, I changed the strategy and put my tickets in the basket with the fewest tickets. I won a digital photo cube. Digital photo cube is not my favorite item, but I demonstrated how changing the strategy could increase the possibility of winning.