On Biostatistics and Clinical Trials: 2010

Monday, December 27, 2010

Bootstrap and SAS

In statistics, bootstrapping is a resampling technique used to obtain estimates of summary statistics. In clinical trials, bootstrapping technique could be a useful approach in obtaining the precision of an estimator. Most common application of the bootstrapping technique may be in obtaining the confidence interval for an estimator while the typical way of obtaining the confidence interval through the standard error approach is impossible or difficult.

Here are two examples that the bootstrapping technique needs to be implemented. The first example is for a manuscript. When we submitted our paper to European Respiratory Journal, one of the reviewer comments was a request for evaluating the internal consistency. The comment says “The statistical method is sample-based as it consists in a regression performed on this sample. Such a method needs at least evaluation for internal consistency (by measuring the regression correlation on a subsample then validating on another subsample or better by using bootstrap and jackknife methods).”

The second example is a request from the regulatory agency for calculating the 95% CI for % relative dif
ference. When there are two treatment means: A and B; % relative difference is defined as %RD= (A-B)/A. There may be other approaches in this case, but bootstrapping technique could come handy in calculating the 95% CI for %RD.

Bootstrap can be easily implemented in SAS and it contains three main steps: 1) resample the data from the observed data set (observed data is only one sample) – SAS Proc Surveyselect can serve this purpose 2) obtain the statistics (or estimator) by performing the analysis for each sample / resample 3) perform the summary statistics from the collection of the statistics or estimator.

Bootstrap is a suggested statistical approach for obtaining the confidence interval for individual and population bioequivalence criteria.

Some good references about how to do bootstrapping using SAS are included here:

How can I bootstrap estimates in SAS? From UCLA ATS
Introduction to bootstrap estimation by Wiley
A Practical Introduction to the Bootstrap Using the SAS System by Nancy Barker
Bootstrap 101: Obtain Robust Confidence Intervals For Any Statistic by Dave Miller et al
A tutorial on bootstrapping in SAS system by Paul Thompson
Don't Be Loopy: Re-Sampling and Simulation the SAS® Way by David Cassell or Bootstrapmania: Resampling the SAS way by the same author
Bootstrapping and SAS

Ten years ago, I had to use a SAS macro to do the bootstrap for my PhD dissertation. The macro is still there on SAS website.

Bootstrap technique has also been built into several SAS procedures (such as Proc Multtest, Proc MI).

When bootstrap is used in regression situation, 'Bootstrap Pairs' technique may be employed. Freedman (1981) proposed to resample directly from the original data: that is, to resample the couple dependent variable and regressor, this is called bootstrapping pairs. Bootstrap pairs is described in a paper by Flachaire. The SAS macro for bootstrapping discussed two main ways to do bootstrap resampling for regression models, depending on whether the predictor variables are random or fixed.If the predictors are random, you resample observations just as you would for any simple random sample. This method is usually called "bootstrapping pairs". If the predictors are fixed, the resampling process should keep the same values of the predictors in every resample and change only the values of the response variable by resampling the residuals.

Sunday, December 12, 2010

Counting the study day

For every clinical trial, we need to count the study day for calculating the follow-up visits and for assessing the temporal relationship between events. The study day starts with the day that the subject is randomized and receives the first dose of the study medication. Usually, the randomization date and the first dose of the study medication date are the same. In clinical study protocol, there should always be a ‘schedule of events’ or ‘schedule of evaluations’ table which defines the study procedures and the study visits. This table should include the study day.

There is one critical difference in counting the study days. The protocol could count the day of subject receiving the first dose of the study medication as “day 0” or “day 1”.

If the first dose date is counted as day 0, the day immediately after the first dose date will be counted as day 1 and the date immediately before will be counted as day -1. Therefore, the study day is counted continuously as … day -7, day -6, day -5, day -4, day -3, day -2, day -1, day 0, day 1, day 2,… In this case, for programming, the study day variable can be created using the formula:

The event/visit date – first dose date

The problem with this counting is in 'day 0'. People are used to calling the first day of the study medication as the 'day 1'.

If the first dose date is counted as day 1, the day immediately after the first dose date will be counted as day 2 and the date immediately before will be counted as day 0 – which is confusing. In practice, if the first dose date is counted as day 1, the day 0 will not be used in the study day counting. The date immediately before will be counted as day -1 (skipped day 0). Therefore, the study day is counted as: day -7, day -6, day -5, day -4, day -3, day -2, day -1, day 1, day 2,… For programming, the study day variable would be created using two separate formulas for predose and postdose visits.

For pre-dose:

the event/visit date – first dose date

For post-dose:

the event/visit date – first dose date + 1

Both of these approaches (counting including study day 0 or not including study day 0) are not wrong, but sometimes confusions can arise when we calculate the study day variable. Even for CDISC, there are disagreements in handling this between Submission Data Set Tabulation Model (SDTM) (not allowing study day 0) and Analysis Data Set Model (ADaM) (allowing study day 0).

The following clinical trial protocol templates indicate that the study day counting starts with day 0:

The following clinical trials indicate that the study day counting starts with day 1. There are more industry trials like this.

The unit used in counting the study day depends on the length of the clinical trials. For a trial with months and years in duration, instead of counting by day, it is more practical to count by week, month, or year. For example, for a clinical trial with three years treatment duration, the last treatment date would be three years away. If we count by day, it will be something like day 1095. Even worse, some people may apply the time window to this date to have the last treatment date 1095 +/- 7 days. Sound stupid, isn’t it?

Counting the study day correctly is important for study investigators/coordinators to avoid the protocol deviation. The Barnettinternational actually developed a tool to facilitate the study day /visit scheduling.

Monday, November 29, 2010

A conditional probability issue?

There is a question and answer from 'AskMarilyn' at Parade.com. I copy the question and answer here since it is a probability issue.

Question: Four identical sealed envelopes are on a table. One contains a $100 bill. You select an envelope at random and hold it in your hand without opening it. Two of the three remaining envelopes are then removed and set aside, still sealed. You are told that they are empty. You are not given the choice of keeping the envelope you selected or exchanging it for the one on the table. What should you do? A) Keep your envelope; B) switch it; or C) it doesn't matter.

Marilyn said you should switch envelopes. Here's her reason: Imagine playing this game repeatedly. You start with a 25% chance of choosing the envelope with the cash. Then two empty ones are taken away on purpose. (Only someone with knowledge of the contents can inform you that sealed envelopes are empty.) so if the $100 bill is in any of the three unchosen envelopes - which it is 75% of the time - you'll get it by switching.

However, I would choose the answer C) it doesn't matter. This is a conditional probability issue. In the beginning, with all four envelopes sealed, the probability of choosing one envelope with $100 bill is 25%. When two envelopes are revealed not to contain the $100 bill, for the remaining two envelopes, each now has 50% probability with $100 bill in it. It doesn't matter if you keep the envelope on hand or switch it for the one on the table.

Saturday, November 20, 2010

Using RevMan to Conduct the Meta Analysis

RevMan (or Review Manager) is designed as a review tool to facilitate the literature review and the meta analyses by the Cochrane Collaboration Group. RevMan can be downloaded from website for free. It can be installed into your system without requiring the system administer privilege. Thousands of systematic reviews and meta analyses published on the Cochrane Library are performed using RevMan. These systemic reviews and meta analyses have been one of the leading resources in evidence-based medicine.

RevMan can be easily used by the medical researchers who are non-statisticians. For statisticians who work in the medical research area, RevMan is an easy tool to perform the meta analyses and generate the graphs (forest plot, funnel plot) in publication standard.

The statistical method and statistical model are described in the document Standard statistical algorithms in Cochrane reviews by Jon Deeks and Julian Higgins and Cochrane Handbook for Systemic Review of Interventions. For statistical models, both fixed model and random model are included in the RevMan. For random models, DerSimonian and Laird random-effects models are used. This is most common random effects model used in Meta Analysis.

RevMan 5 is extremely easy to use. Various tutorials, tips, webinars are provided in RevMan documentation website and The Cochrane Collaboration Open Learning Material. I find it is extremely useful to watch two webinars (especially the part 2 regarding the data and analyses. For

To perform a Meta analysis, RevMan is just a tool. There are a lot of works to be done prior to enter the information including data into the RevMan. Considerable time needs to be spent on the literature search. Since the data used in Meta analyses relies on the publications, some data needs to be converted first. For example, for outcomes measured in continuous variable, the published article may only provide the Standard Error or just the 95% confidence interval. The SE can be easily converted to the Standard Deviation by multiplying the square root of the sample size. If only the 95% confidence interval is available, the standard deviation can be approximated by normal approximation using upper bound = mean +/- 1.96 * SE.

Sunday, November 07, 2010

Good Review Practice

In previous article, 'regulatory science' is discussed. 'Good Review Practice' can be considered one aspect of the regulatory science. Here Good Review Practice is specifically refer to a “documented best practice” within CDER that discusses any aspect related to the process, format, content and/or management of a product review.

On the industry side, the sponsor needs to establish the standard operating procedures (SOP) and the working procedure documents (WPDs) to ensure the compliance of the regulatory guidance and GCP and to improve the efficiency. On the regulatory side, it is important to establish the good review practice to ensure that the same standard procedures are following during the review process for drug approval.

These good review practices could cover the review process in different areas: efficacy, safety, pregnancy, CMC,... They are supposed to be written for FDA reviewers, however, understanding the good review practice is also very helpful for sponsor to prepare the regulatory submission documents in a way that is amenable to the reviewers. The mis-communication between the sponsor and the regulatory could be minimized. All necessary information/analyses required per good review practice are included in the submission documents.

Below are some links related to good review practice:

Sunday, October 31, 2010

Regulatory Science

Regulatory Science is the science of developing new tools, standards, and approaches to assess the safety, efficacy, quality, and performance of all FDA-regulated products including drug, biological products, medical device and more. On February 24, 2010, FDA along with NIH launched its Advancing Regulatory Science Initiative (ARS) aim to accelerate the process from scientific breakthrough to the availability of new, innovative medical therapies for patients.

On October 6, 2010, the U.S. Food and Drug Administration unveiled an overview of initiatives to advance regulatory science and help the agency assess the "safety, efficacy, quality and performance of FDA-regulated products." And published its white paper ’Advancing Regulatory Science for Public Health - A Framework for FDA's Regulatory Science Initiative” The white paper outlines the agency's effort to modernize its tools and processes for evaluating everything from nanotechnology to medical devices to tobacco products.

In companion to the release of the white paper, FDA commissioner, Dr Hamburg gave a speech to the National Press Club in Washington, DC

In the white paper, the section I “Accelerating the Delivery of New Medical Treatments to Patients” has specific meaning to statisticians. “Adaptive design” was not specifically mentioned in the white paper, however, any approach or methodology in clinical trial design that can expedite the drug development process should be encouraged. The personalized medicine should also be encouraged.

Even though the regulatory science or regulatory affairs is critical in drug development field, the professionals working in the field are very diversified and come from variety of different backgrounds. Perhaps, you can only learn the regulatory science through the experience and on-job training. However, I do notice that USC has a graduate program in regulatory science. Considering that FDA is increasing its investment in regulatory science and the regulatory laws are getting more and more complicated, the graduates from this program should not have any difficulty in finding a job.

Friday, October 08, 2010

Missing data in clinical trials - the new guideline from EMEA and National Academies

Missing data issues have been discussed and debated for many years. Handling of missing data in clinical trials has been recognized as an important issue not only for statisticians who analyze the data, but also for the clinical study team who conduct the study. While we are still waiting for FDA to issue its guidance on missing data in clinical trials, there are several guidelines published recently.

EMEA just issued its final rule of "Guideline on missing data in confirmatory clinical trials". This guideline provided the guidance on handling the missing data from the perspective of European regulatory authorities. Comparing to the FDA's guidance on non-inferiority and adaptive design, EMEA's missing data guidance is written in plain language and can be easily understood by the non-statisticians.

The recent trend is to discourage the use of LOCF and other single imputation methods (ie, replace the missing value with the last measured value, with averaged value, or with baseline value,...). It is noted that LOCF is mentioned as one of the single imputation methods in EMEA's guideline. The guideline acknowledged that "Only under certain restrictive assumptions does LOCF produce an unbiased estimate of the treatment effect. Moreover, in some situations, LOCF does not produce conservative estimates. However, this approach can still provide a conservative estimate of the treatment effect in some circumstances.". The guideline further elaborated that LOCF may be a good technique for studies (e.g. depression, chronic pain) where the condition is expected to improve spontaneously over time, but may not be conservative for studies (e.g. Alzeimer's disease) where the condition is expected to worsen over time.

In the United States, the Division of Behavioral and Social Sciences and Education under National Research Council of the National Academies have been working on a project "Handling missing data in clinical trials". The working group recently makes its draft report available. The draft report is titled "The prevention and treatment of missing data in clinical trials". I like the word 'prevention' in the title since it is critical to prevent or minimize the occurrence of missing data. Once the missing data has happened, there is no universal method to handle the missing data perfectly. The assumptions of MACR, MAR, and MNAR can never been fully verified.

Academies' report on missing data has a stronger language in discouraging the use of LOCF and other simple imputation approaches. The recommendation #10 stated "Single imputation methods like last observation carried forward and baseline observation carried forward should not be used as the primary approach to the treatment of missing data unless the assumptions that underlie them are scientifically justified."

So far, there is no official guideline from FDA regarding the missing data handling (even though the topic has been the perennial topic in almost all statistics conferences and workshops). Nevertheless, a presentation by Dr. O'Neill to the International Society of Clinical Biostatistics may give some insides.

Sunday, October 03, 2010

Individual response vs. group response

In clinical trials, the efficacy endpoints are often measured as continuous variables. The hypothesis tests are used to determine whether or not there are statistically significant differences between one group vs. another group. This is desired by the statisticians. However, for treating physicians, the treatment effect on group basis may not translate to the effect to an individual patient. When we move toward to the personalized medicine, the individual response may be more important than just the group response.

It is interesting that the individual response and individual assessment (or within patient analysis, intra-subject changes...) were greatly discussed in this year's FDA/Industry Statistics Workshop.

For patient reported outcome, the statistically significant group change does not necessarily imply a meaningful difference for individual patients. To provide meaningful interpretation of patient reported outcome intervention and treatment effects, there should be a responder definition to classify each individual subjects as responder or non-responder. The FDA guidance stated "Regardless of whether the primary endpoint for the clinical trial is based on individual responses to treatment or the group response, it is usually useful to display individual responses, often using an a priori responder definition (i.e., the individual patient PRO score change over a predetermined time period that should be interpreted as a treatment benefit). The responder definition is determined empirically and may vary by target population or other clinical trial design characteristics. Therefore, we will evaluate an instrument’s responder definition in the context of each specific clinical trial." The challenging issue is how to determine the cutpoint or benchmark for the definition of the responder. Several approaches have been proposed in the literature. We had actually implemented various approaches to determine the responder (or clinical meaningful difference) in a neurology disease. In the article, two of the anchors are used: one based on physician's assessment and one based on global assessment by the patient (question #2 in SF-35 instrument). It is interesting that the statistical approaches are employed to find the clinical meaningful difference.

Once the cutpoint (clinical meaningful difference) is decided, the continous variable will be dichotomized into responder and non-responder. The analysis will them be shifted from the parametric method (t-test, ANOVA, ANCOVA,...) to categorical data analysis method (chisquare, logistic regression, generalized linear model,...). Statistician will argue that by doing so, we lost a lot of efficiency in statistical testing. A paper by Snappin and Jiang titled "Responder analyses and the assessment of a clinically relevant treatment effect" just did this argument.

In the recently published EMEA "Guideline on missing data in confirmatory clinical trials", responder analysis was mentioned to have a benefit of handling the missing data. It stated:

"In some circumstances, the primary analysis of a continuous variable is supported by a responder analysis. In other circumstances, the responder analysis is designated as primary. How missing data are going to be categorised in such analyses should be pre-specified and justified. If a patient prematurely withdraws from the study it would be normal to consider this patient as a treatment failure. However, the best way of categorisation will depend on the trial objective (e.g. superiority compared to non-inferiority).

In a situation where responder analysis is not foreseen as the primary analysis, but where the proportion of missing data may be so substantial that no imputation or modelling strategies can be considered reliable, a responder analysis (patients with missing data due to patient withdrawal treated as failures) may represent the most meaningful way of investigating whether there is sufficient evidence of the existence of a treatment effect."

Within-patient analyses were brought up again in assessing benefit:risk. Currently, the benefit:risk assessment relies on separate marginal analyses. The efficacy (benefit) and safety (risk) are analyzed separately. The aggregation of the benefit:risk relies on the assessment of medical reviewers, not statisticians. The aggregate analyses of benefit:risk are typically qualitative, not quantitative with significant subjectivity. With within-patient analyses, each patient was assessed for benefit and risk before performing the group comparison for treatment effect. One of these approaches is so called Q-Twist (The quality-adjusted time without symptoms of disease or toxicity of treatment) where the toxicity or safety information is incorporated into the efficacy assessment for each patient before any group comparison. The paper by Sherrill et al is one of these examples.

Sunday, September 26, 2010

Do we really need to statistical soluation for everything?

I came back from last week's FDA/Industry Statistics Workshop with more questions than answers. While the theme for this year's workshop is on risk benefit assessment, the old regular issues such as multiplicity, missing data, meta analysis, adaptive design, subgroup analysis,...are still the hot topics. For both the new (risk benefit assessment) and the old topics, there are more questions being raised and for many, there are no clear answer to these questions.

For adaptive design and non-inferiority clinical trials, FDA issued the draft guidance early this year; however, both guidance were written more like a technical report for statistician, and unlikely to be understood by the non-statisticians. For non-inferiority design, more questions were raised about the subjectivity / objectivity in determining the non-inferiority margin. For risk-benefit assessment, perhaps, we have to rely on the medical experts in the specific therapeutic area to make their subjective judgment based on the separate marginal analyses of Benefit (efficacy) and Risk (Safety) instead of different weighted modeling approaches. Perhaps, there is no simple mathematical and statistical solution for the benefit risk assessment. I believe that the advisory committee members were making subjective judgments based on their experience in voting in favor of or against a product for benefit and risk assessment - like Jury's verdict.

it is not a good thing that as statisticians, we come up with some complicated statistical methodology which we can not explain well to the non-statisticians. Eventually, we may need to go back to the basics to follow the KISS (Keep it simple) principle. Several years ago, the complicated and bad math that nobody could really understand caused the financial crisis. A working paper, Computational complexity and informational asymmetry in financial products, Sanjeev Arora, Boaz Barak, Markus Brunnermeier, Rong Ge. sheds some light on the complex mathematical models upon which credit default obligations and other derivatives are based.

Sunday, September 19, 2010

Number of Events vs. Number of Subjects with Events

Often in clinical trial safety data analysis, people are confused with the basic concept of "number of events" vs "number of subjects with events". Obviously, the number of events counts the events (event level) while number of subjects counts the subjects (subject level).

Using adverse event (AE) summary as an example,the difference between “the number of AEs” and “the number of subjects with AEs” sometimes may not be very obvious for some people. For the number of AEs, since the same subject can have more than one adverse events, we can not really calculate the percentage since the numerator and denominator could be any number. It is a mistake if you divide the number of events with the number of subjects under a treatment arm. You could have an unreasonably large percentage (sometimes larger than 100%).

For the number of subjects with AEs, we always count by subject. If a subject has more than one AEs, it will be counted one once. Therefore, the numerator (the number of subjects with AEs) is always smaller than the denominator (number of subjects exposed). We can calculate the percentage and the percentage should always be less than 100%. We can this percentage as 'incidence of AEs'. The following table (extracted from a document in FDA's website) is an example of AE presentation (counted by subjects).

The statistical summary tables for adverse events are often constructed to present the both total # of AEs and # of subjects with AEs (or precisely the number of subjects with at least one AEs). However, in the table, there will be no percentage calculated for total # of AEs. If the readers are not clear about the concept of "# of AEs" vs "# of subjects with AEs), they could question the correctness of the summary table. Very often, they might count the # of subjects with AEs and compare with the # of AEs and find discrepancies (sure there will be discrepancies). The reason? some subjects must have more than one AEs.

In some situations, we can indeed calculate the rate, proportion for # of Events (number of adverse events). For example, for total number of AEs, we could calculate how many of these events are mild, moderate, severe. You could see this information presented in package insert for some approved drugs on the market. We could also calculate the incidence rate of AEs by using the total number of AEs as numerator and total number of infusions, total number of dose distributed, or total number of person years as denominator. In these situations, we should always understand what the numerator is and what the denominator is. For a good presentation of the statistical summary table, the numerator and denominator used for calculation should be specified in the footnote. A few years ago, I saw a commercial presentation comparing a company's product safety with other competitive products. When they calculate the AE frequency for their product, they use (total number of AEs) / (the total number of doses). When they calculate the AE frequency for other competitive products, they use (total number of AEs) / (total number of subjects). Since each subject receives more than one doses, their calculation of the AE frequency for their product is markedly lower. However, this trick is wrong and unethical.

Sunday, September 12, 2010

Restricted randomization, stratified randomization, and forced randomization

Randomization is a fundamental aspect of randomized controlled trials (RCT). When we judge a quality of a clinical trial, whether or not it is a randomized trial is a critical point to consider. However, there are different ways in implementing the randomization and some of the terminologies could be very confusing, for example, 'restricted randomization', 'stratified randomization', and 'forced randomization'.

Without any restriction, the randomization is called 'simple randomization' where there is no block, no stratification applied. Simple randomization will usually not be able to achieve the exact balance of the treatment assignments if the # of randomized subjects are small. In contrary, the restricted randomization refer to any procedure used with random assignment to achieve balance between study groups in size or baseline characteristics. The first technique for restricted randomization is to apply the blocks. Blocking or block randomization is used to ensure that comparison groups will be of approximately the same size. Suppose we are planning to randomize 100 subjects to two treatment groups, with simple randomization, if we enroll entire 100 subjects, we may have approximately equal number of subjects in one of the treatment groups. However, if we enroll a small amount of subjects (for example 10 subjects), we may see quite some deviation from equal assignments and there may not be 5 subjects in each treatment arms. With the application of blocking (block size=10), we can ensure that with every 10 subjects, there will be 5 to each treatment arm.

Stratified randomization is used to ensure that equal numbers of subjects with one or more characteristic(s) thought to affect the treatment outcome in efficacy measure will be allocated to each comparison group. The characteristics (stratification factor) could be patient's demographic information (gender, age group,...) or disease characteristics (baseline disease severity, biomarkers,...). If we conduct a randomized, controlled, dose escalation study, the dose cohort itself can be considered as a stratification factor. With stratification randomization, we essentially generate the randomization within each stratum. # of strata depends on the number stratification factors used in randomization. If we implement 4 randomization factors with each factor having two levels, we will have a total of 16 strata, which means that our overall randomization schema will include a total 16 portions of the randomization with each portion for a stratum. In determining the # of strata used in randomization, the total number of subjects need to be considered. Overstratification could make the study design complicated and might also be prone to the randomization error. For example, in a stratified randomization with gender as one of the stratification factor, a male subject could be mistakenly entered as female subject and a randomization number from female portion instead of male portio nof the randomization schema could be chosen. This may have impact on the overall balance in treatment assignment as we originally planned. A paper by Kernan et al had an excellent discussion on stratified randomization.

One of the misconception about the stratification is that equal number of subjects are required for each stratum. for example, when we talk about randomization stratified by gender (male and female), people will think that we would like to have 50% of male and 50% of female subjects in the trial. This is not true. What we need is to (assuming 1:1 randomization ratio) have 50% of subjects randomized to each treatment arm in male subjects and in female subjects. This issue has been discussed in one of my old articles.

The forced randomization is another story and it basically to force the random assignment to deviate from the original assignment to deal with some special situation. For example, in a randomized trial with moderate and severe degree of subjects, we may put a cap on the # of severe subjects to be randomized. When the cap is achieved, the severe subjects will not be randomized any more, but the moderate subjects can still be randomized. We could enforce a cap for # of subjects at a specific country/site or limit the number of subjects for a specific treatment arm to be randomized at a particular country/site. The forced randomization is usually required to deal with the operation issues and is implemented through IVRS or IWRS. Too much forced randomization will neutralize the advantages of the randomization.

all three terms (restricted, stratified, and forced randomization) belong to the fixed sample size randomization in contrary to the dynamic randomization in adaptive designs.

Sunday, September 05, 2010

Immunogenicity and its impact in clinical trials

There is a shift in drug development field from the chemical compounds to the biological products - protein products and a shift from traditional pharmaceutical companies to biotechnology companies. For those who work on clinical trials on biological products, the term 'immunogenicity' must be a familiar term. Immunogenicity is the ability of a particular substance, such as an antigen or epitope, to provoke an immune response. In other words, if our drug is protein product, immunogenecity is the ability of the protein to induce humoral and/or cell-mediated immune responses. Immunogenicity testing is a way to determine whether patients are producing antibodies to biologics that can block the efficacy of the drugs. The development of anti-drug antibodies can also cause allergic or anaphylactic reactions, and/or induction of autoimmunity.

Several workshops have been organized to discuss the immunogenicity issues. Regulatory agencies are developing the guidelines to give industry the directions to incorporate the immunogenicity testing into the clinical development program for biological products.

Since the immunogenicity testing relies on the assay to measure the immune responses, the results of immunogenicity depend on which assay is used for testing. Therefore, the development of immunogenicity assay is also a critical issue in immunogenicity testing. Using an ultra sensitive assay could detect many false positives in immunogenicity testing. Using a less sensitive assay could under-estimate the immune response.

The following collection regarding immunogenicity testing should provide a good resource for this topic.

If a company develops a follow-on biologicals, a generic form of biological product or a copycat of a biotechnology product, immunogenicity testing is typically one of the critical points that need to be addressed. The regulatory, therefore, issued the guidelines on the immunogenicity testing when developing the follow-on biologicals.

Sunday, August 29, 2010

Transparency - a change about drug approval information on FDA's website

I recently noticed that for the approval of the new drugs (NDA) and biological products (BLA), the information about the approval process was published on FDA's website and in very timely fashion. Just a year ago, the FDA review/approval process regarding a new product was still not transparent to the public. We may be able to find some information on label, approval letter, and SBA (summary basis of approval); however, there were typically months or years after the approval.

Now, for the new approvals, not only the label, approval letter, and SBA, but also reviews from different perspectives (medical, statistical, pharmacology, enrivomental, CMC,...) as well as the administrative documents and correspondence between the FDA and the sponsor may be posted on FDA's website. Also listed or published is the list of FDA's officers who participated in the review and the decision making. The individuals from the sponsor's side may also be listed in some documents or correpondence.

This is obviously the outcome of FDA's initiative on transparency. "In June 2009, Food and Drug Administration (FDA) Commissioner Dr. Margaret Hamburg launched FDA's Transparency Initiative and formed an internal task force to develop recommendations for making useful and understandable information about FDA activities and decision-making more readily available to the public, in a timely manner and in a user-friendly format."

To feel these changes, we can just take a look at two products recently approved by FDA: one by CDER and one by CBER. Don't forget to visit "Administrative Document(s) and Correspondence" or "Approval History, Letters, Reviews, and Related Documents".

This is a good sign that FDA's drug approval process is being demystified and moving to the transparency.

Tuesday, August 17, 2010

LOCF, BOCF, WOCF, and MVTF

In clinical trials, subjects are usually followed up for a period of time with multiple measurements / assessments at various time points. It is very common that some subjects will discontinue from the study early due to the reasons like 'lost to follow-up', 'withdraw consent', 'adverse events',...

With intention to treat population, imputation technique is needed to deal with the early termination subjects. While the fancy technique such as multiple imputation may be more statistically sound, some practical imputation techniques may be more popular. Here are some of them that I have used.

LOCF (last observation carried forward): this is probably the most common technique used in the practice in handling the missing data (especially for continuous measures). This is also the technique mentioned in ICH E9 "Statistical principles for clinical trials". It states "...Imputation techniques, ranging from the carrying forward of the last observation to the use of complex mathematical models, may also be used in an attempt to compensate for missing data..."
LOCF can be easily implemented in SAS. See a SUGI paper titled "The DOW (not that DOW!!!) and the LOCF in Clinical Trials"

BOCF (baseline observation carried forward): this approach may be more conservative if the symptoms are gradually improving over the course of the study. I used this technique in several clinical trials testing the analgesic drug (pain killer) in dental surgery patients. At the baseline right after the dental surgery, the pain scale is the worst. With time, the pain intensity is supposed to decrease. In this situation, BOCF technique is more conservative than LOCF. There is a web article to look at the feature of BOCF. BOCF along with LOCF and a modified BOCF are discussed in a most recent FDA advisory committee on Cymbalta for the Treatment of Chronic Pain

WOCF (Worst observation carried forward): this approach is the most conservative comparing to LOCF and BOCF. This technique has been used in analgesia drug as well as the trials with laboratory results as endpoint. For example, WOCF technique is mentioned in FDA Summary on Durolane.

LOCF, BOCF, and WOCF are handy technique for continuous measures. For a trial with endpoint as dichotomous variable (success vs failure; responder vs. non-responder),a technique called MVTF can be used. MVTF stands for missing value treated as failure. For example, this technique is mentioned in Statistical Review of NDA 21-385 in dermatology indication. In one of studies I participated, we employed the same technique (even though we did not use the term MVTF) to treat all subjects who discontinued from the study early as non-responders. This is a very conservative approach. The treatment effect may be neutralized a little bit during the implementation of this technique.

There are many other techniques used in the practice. Some of them may be just different terms for the same technique. In FDA Executive Summary Prepared for the
July 30, 2010 meeting of the Ophthalmic Devices Panel P080030, the following imputation techniques are mentioned.

Last Observation Carried Forward (LOCF) analysis

Best Reasonable Case analysis
Worst Reasonable Case analysis
Non-Responder analysis
Best Case analysis
Worst Case analysis

in practice, it is typically to employ at least two techniques in handling the missing data. This is part of so called 'sensitivity analysis'.

it must be pointed out that these practical missing data handling techniques have no statistical basis and have been criticized by many professionals especially in academic setting. These techniques that seem to be very conservative, may not be conservative in some situations.

Since LOCF is a technique used most, the critics are usually centered on the comparison of LOCF and other model based techniques (for example, Mixed-Effect Model Repeated Measure (MMRM) model). Some of the comparisons and discussions can be found at:

For continuous endpoints in longitudinal clinical trials, a good strategy may be to employ the mixed model or MMRM as the primary analysis and one of XOCFs as the sensitivity analysis. This is exactly the strategy used in NDA submission on Cymbalta for the Treatment of Chronic Pain

Saturday, August 07, 2010

R-square for regression without intercept?

Sometimes, simple linear regression may not be very simple. One of the issues is to decide whether or not to fit the regression with the intercept or without the intercept. For regression without intercept, the regression line goes through the origin. for regression with intercept, the regression line does not go through the origin.

In clinical trials, we may need to fit the regression models about the drug concentration vs. dose; AUC vs. trough concentration,...Regression with or without a intercept relies on the scientific background, not purely the statistics. Using the drug concentration vs dose as an example, if there is no endogenous drug concentration, a regression model without intercept makes sense. If there is a endogenous drug concentration, a regression model with intercept will be more appropriate - when there is no dose given, the drug concentration is not zero.

In some situation, regression models are purely data-driven or empirical. Choosing a model with or without an intercept may not be easy to decide. We recently had a real experience in this. With the same set of data, we fitted the models with intercept and without intercept. We thought we could judge which model was better by comparing the R-square values - an indicator for goodness of fit. Surprisely, the models without intercept were always much better than the models with intercept by comparing the R-squares. However, when we thought twice about this, we realized that in this situation, the R-square was no longer a good indicator of the goodness of fit.

The problem is that the regression model without intercept will always give a very high R-square. This is related to the way how the sum of squares are calculated. There are two excellent articles discussing this issue.

Comparing treatment difference in slopes?

In regulatory setting, can we show the treatment difference by comparing the slopes between two treatment groups?

In a COPD study (e.g., a two arm, parallel group with primary efficacy variable measured at baseline and every 6 months thereafter), one can fit the random coefficient model and compare the treatment difference between two slopes. Also we can compare the treatment difference in terms of change from baseline to the endpoint (the last measure).

To test the difference in slopes, we would need to test whether or not the treatment*time interaction term is statistically significant. The assumption is that at the beginning of the trial, the intercept for both groups are the same - both groups start at the same level. Then if the treatment can slow the disease progression, the treatment group should show a smaller slope comparing with the placebo group. If all patients are followed up to the end of the study, if the slopes are different, the endpoint (change from baseline) analysis should also be statistically different. However, if the sample size is not sufficiently large, the results could be inconsistent by using slope comparison approach vs. endpoint analysis approach. For a given study, the decision has to be made which approach is considered as the primary endpoint. If we analyze the data using both approaches, we will then need to deal with the adjustment for multiplicity issue.

I used to make a comment saying "some regulatory authorities may prefer the simpler endpoint analysis"; I was then asked to provide the references to suport this statement. I did quite extensive research, but could not find any real relevant reference. However, by reviewing 'statistical reviews' in the BLA and NDA in US, it is very rare to see any product approval based on the comparison of the slopes. Many product approvals are based on the comparison of 'change from baseline'.

Every indication has its own accepted endpoints so the tradition takes precedence. For example, in Alzheimer's disease, there is a movement to look at differences in slopes, but this is based on trying to claim disease modification. Similarly, in the COPD area, some products are based on disease modification, the treatment differnces can be shown by comparing the differences in slopes between treatment groups.

It seems to be true that that the slope model (random coefficient model) may be preferred in academic setting, but endpoint approach - change from baseline (with last value carried forward) may be more practical in the industry setting.

From the statistical point of view, the slope approach makes a lot of sense, however, we need to be cautious about some potential issues: 1. In some efficacy measures, there might be some type of plateau. If the plateau is reached prior to the end of the study, there will be a loss of power comparing slopes.2. If the slope comparison is used as the primary efficacy measure, the # of measurements per year on the primary efficacy variable is relevant. One may think that the more frequent measures will increase the power to show the treatmetn difference in slopes. The question arise when designing the study: should we choose a shorter trial with more frequent measures? or should we choose a longer trial with less frequent measures?

Sunday, June 13, 2010

Biosimilars - Generic Version of Biological Drugs

The health reform legislation that was recently signed into law contains a provision that creates a pathway to enable the US Food and Drug Administration (FDA) to approve biosimilars - generic versions of biologic drugs. Unlike generic small molecule drugs, which is the synthetic chemical compounds, the complexity of biologic drugs makes it questionable whether a generic company could produce an identical biologic product. WSJ recently reported that Merck decided to end efforts to copy Amgen's blockbuster Aranesp anemia drug, which showed how the emerging field of developing generic versions of biotechnology therapies won't be easy to enter. For small molecule chemical compounds, once the patent is expired, the generic companies can start to manufacture the same ingredients (active pharmaceutical ingredients (APIs)) for the generic version of the brand products. The approval of the generic version typically requires only the bioavailability/bioequivalence tests in healthy volunteers. For biological products, it is typically the large proteins with 3-D or 4-D structures. The same protein with different 3-D structure may have different functions. This makes the biosimilars very difficult to copy - similarity does not mean the same therapeutic effects. When we deal with the proteins, we will always need to deal with the issues of immunogenicity. Some of the biological products (for example the plasma-derived products) can not be tested in the healthy volunteers. If a bioequivalence study is required, it is typically done in real patients, which makes the trial much more expensive than typical bioavailability/bioequivalence trial in healthy volunteers.

Further reading:

Saturday, May 29, 2010

Some clarifications on Non-inferiority (NI) Clinical Trial Design

I noted that FDA recently issued its draft guidance on non-inferiority clinical trial design. Two weeks ago, I attended the DIA webinar "understand the primary challenges facing non-inferiority studies" that featured the presentations by Drs Bob Temple, Bob O'Neill, and Ed Cox from FDA. Several issues that often prevent us from thinking about the use of NI trial are now clarified:

1. Can we use an active control when the active control is not approved for the indication, but used as standard care (off-label)?

This was answered in section V of the draft guidance. "The active control does not have to be labeled for the indication being studied in the NI study, as long as there are adequate data to support the chosen NI margin. FDA does, in some cases, rely on published literature and has done so in carrying out the meta-analyses of the active control used to define NI margins. "

2. When literatures are used to support the choice of NI margin, what if the endpoints are different from various historical studies?

in Section V of the draft guidance, it says "...among these considerations are the quality of the publications (the level of detail provided), the difficulty of assessing the endpoints used, changes in practice between the present and the time of the studies, whether FDA has reviewed some or all of the studies, and whether FDA and sponsor have access to the original data. As noted above, the endpoint for the NI study could be different (e.g., death, heart attack, and stroke) from the primary endpoint (cardiovascular death) in the studies if the alternative endpoint is well assessed".

3. What if there is no historical clinical trial that directly compare the active control vs. Placebo?
We would typically think that if this is a situation, NI study design is not an option any more because there is no way to estimate the NI margin (precisely M1 margin). However, Dr. Ed Cox presented an example during the webinar to estimate the non-inferiority margin in an indirect way. While there is no clinical trial directly comparing the active control with Placebo, we can still estimate the treatment effect of active control by search for evidence separately for active control group and for Placebo group. For example, in anti-infective area, a lot of antibiotics have been used for many years (perhaps even before FDA is formed). There might never been a formal clinical trial to show that certain antibiotics are better than placebo. Now in pursuing a new antibiotics for indication, a placebo controlled study is not ethical (since other antibiotics products are the standard care). In order to conduct a NI study, it is challenging in choosing the NI margin. The suggestion from Dr Ed Cox's presentation is to derive estimate of effect of active control over placebo by:

Estimate the placebo response rate or the response rate if untreated
Estimate the response rate in the setting of "inadequate" or "inappropriate" therapey
Estimate the response rate of the active control therapy from literatures with active control therapy information

A recent paper in 'Drug Information Journal" detailed the similar approach. See the link below for the paper "Noninferiority margin for clinical trials of antibacterial drugs for Nosocomial Pneumonia". FDA's guidance on "Community-Acquired Bacterial Pneumonia: Developing Drugs for Treatment" also has a section for non-inferiority margin issue in antibacterial drug.

Notice that for each estimation, a collection of literatures are analyzed and often the meta analysis is required. The meta analysis may require the random effect estimate. In Dr Cox's presentation, DerSimonian Larid random effects estimates is used. This approach is described in the original paper as well as many books on meta analysis (for example, "meta analysis of controlled clinical trials" by Anne Whitehead). My colleague wrote the SAS program for this approach. A SAS paper compared the results from DerSimonian approach with the results from SAS Proc Mixed and NLMixed.

Sunday, May 16, 2010

Hodges-Lehmann Estimator

According to Wikipedia, "the Hodges–Lehmann estimator is a method of robust estimation. The principal form of this estimator is used to give an estimate of the difference between the values in two sets of data. If the two sets of data contain m and n data points respectively, m × n pairs of points (one from each set) can be formed and each pair gives a difference of values. The Hodges–Lehmann estimator for the difference is defined as the median of the m × n differences.
A second type of estimate which has also been called by the name "Hodges–Lehmann" relates to defining a location estimate for a single dataset. In this case, if the dataset contains n data points, it is possible to define n(n + 1)/2 pairs within the data set, allowing each item to pair with itself. The average value is calculated for each pair and the final estimate of location is the median of the n(n + 1)/2 averages.(Note that the two-sample Hodges–Lehmann estimator does not estimate the difference of the means or the difference of the medians (it estimates the median of the differences, which, if the underlying distributions are asymmetric, is a different quantity), while the one-sample Hodges–Lehmann estimator does not estimate either the mean or the median.)"

I first time heard this estimator was in a pharmacokinetic bioequivalence study where we had to compare the Tmax between two groups. Typically, we don't need to compare the Tmax between treatment groups since the bioequivalence is typically based on AUC (area under the plasma-concentration curve) and/or Cmax (maximum concentration). Assessment of t_max was mandatory only if

either a clinical claim was made (e.g., rapid onset like for some analgetics),
or based on safety grounds (e.g., IR nifedipine).

Tmax is the time to reach the maximum concentration (Cmax) after the drug administration. Tmax data is certainly not following the normal distribution and is usually taking only several pre-specified the sampling time point (depending on how many time points are specified in obtaining the PK profile).In this case, a distribution free non-parametric test needs to be used. Hodges-Lehmann estimator can fit into this situation. In addition to Tmax, Hodges-Lehmann can also be used to test the difference for Thalf (t1/2).

In old days, we have to write the SAS program by ourselves. In the latest version of SAS 9.2, Proc NPAR1WAY can be used for calculating the Hodges-Lehmann estimator and its confidence interval. See Hodges-Lehmann Estimation of Location Shift for details about the calculation and an example of "Hodges-Lehmann Estimation" from SAS website.

With HL statement and Exact HL statement in SAS Proc NPAR1WAY, Hodges-Lehmann estimator (location shift) can be estimated and its confidence intervals (asymptotic (Moses) for large sample and Exact in small sample situation) are provided. However, SAS procedure does not provide the p-value. The p-value may be obtained from Wilcoxin Rank Sum test.

Also see a newer post regarding "Hodges-Lehmann estimator of location shift: median of differences versus the difference in medians or median difference"

Sunday, May 09, 2010

Stronger Bioequivalence Standard?

In April, 2010, "the Pharmaceutical Science and Clinical Pharmacology Advisory Committee (104715) (PSCPAC)" discussed the issues related to the bioequivalence standard.

"The statistical analysis and acceptance criteria seem to be the most confusing aspects of regulatory bioequivalence evaluation. The current statistical analysis, the two one-sided tests procedure, is a specialized statistical method that is capable of testing for “sameness” or equivalence between the two comparator products. The pharmacokinetic parameters, calculated from the bioequivalence study data, area under the plasma concentration-time curve, (AUC) and maximum plasma concentration (Cmax) represent the extent and rate of drug availability, respectively. All data is log-transformed and the analysis of variance (ANOVA) is used to calculate the 90% confidence intervals of the data for both AUC and Cmax. To be confirmed as bioequivalent, the 90% confidence intervals for the test (generic product) to reference (marketed innovator product) ratio must fall between 80 to 125%. This seemingly unsymmetrical criteria is due to the logtransformation of the data."

However, this one-size-fits-all approach may not be adequate for all pharmaceutical products. One category of the pharmaceutical products is called "critical dose (CD) drugs". CD drugs are also called "narrow therapeutic index (NTI) drugs" and are medicines for which comparatively small differences in dose or concentration may lead to serious therapeutic failures and/or serious adverse drug reactions. It is reasonable to assume that a more stringent bioequivalence criteria should be employed to ensure the safety of the product.

According to the voting results from advisory committee, advisory committee agreed that CD drugs are a distinct group of products; the FDA should develop a list of CD drugs; and the current BE standards are not sufficient for CD drugs.

The FDA proposes that in addition to 80-125% criteria based on 90% confidence interval, a limit of 90-111% on the geometric mean (point estimate) of all BE parameters (i.e., Cmax, AUC0-t, AUC0-∞) is added to the more stringent bioequivalence criteria. However, this proposal was not agreed by the advisory committee. Panelists commented that the scientific basis for the proposed limit of 90-111% was not justified. Some members specified that they did not favor use of Cmax in the proposal, but likely would have been swayed if it focused solely on AUC.

To claim the bioequivalence, should both AUC and Cmax meet the bioequivalence criteria? While regulatory guidance mentioned that AUC and Cmax are typically parameters for evaluating bioequivalence, there is no guidance formerly requiring that both AUC and Cmax have to be demonstrated. In some situation, Cmax may not be applicable in showing the bioequivalence. For example, when comparing the drug giving in different administration routes (intravenous vs subcutaneous), equivalence in AUC could be established while equivalence in Cmax could not be established.

Sunday, May 02, 2010

CDISC beyond the data

CDISC stands for the Clinical Data Interchange Standards Consortium. I have always been thinking that the CDISC is about the data standard and data structure and has nothing to do with the protocol, case report form and so on.

However, for the last several years, CDISC has expanded its reach into the entire flow of the clinical trial. For each step in clinical trial, there is its counterpart in CDISC standard.

Protocol: Protocol representation model (PRM)
Case Report Form: The Clinical Data Acquisition Standards Harmonization (CDASH)
Data management data set: The Study Data Tabulation Model (SDTM)
Analysis data set: Analysis data model (ADaM)

The protocol representation model is pretty new and is just recently released. PRM is actually now a subdomain of the BRIDG model. PRIDG stands for Biomedical Research Integrated Domain Group and is a collaborative effort engaging stakeholders from the Clinical Data Interchange Standards Consortium (CDISC), the HL7 Regulated Clinical Research Information Management Technical Committee (RCRIM TC), the National Cancer Institute (NCI) and its Cancer Biomedical Informatics Grid (caBIG®), and the US Food and Drug Administration (FDA).

For each clinical trial, the study protocol is the key. The study protocol is typically a text document and is developed from the protocol template. The protocol is considered as a document, not a data. PRM is trying to change this.

The PRM is NOT a specific protocol template; rather, when a template is designed to meet the purposes of a given organization or study type, the use of the PRM common elements will enable and facilitate information re-use without constraining the design of the study or the style of the document. The PRM elements have been found to be typical across study protocols, but they do not reflect either a minimum or a maximum set of elements.

There are four major components of the PRM v1.0—that is, four major areas of a protocol that the elements are related to:

Clinical Trial/Study Registry: Elements related to the background information of a study, based on the requirements from WHO and Clintrials.gov. Examples of elements in this area include Study Type, Registration ID, Sponsors, and Date of First Enrollment.

Eligibility: Elements related to eligibility criteria such as minimum age, maximum age, and subject ethnicity.

Study Design Part 1: Elements related to a study’s experimental design, such as Arms and Epochs.

Study Design Part 2: Elements related to a study’s Schedule of Events and Activities.

It is envisioned that with PRM, the key elements of the protocol can be considered as data strings and can be stored in the data set and can be re-used. The statistical analysis plan can be easily developed by importing the key elements from the protocol. However, to make all companies to follow this standard will take time. There may be a lot of challenges in implementing this standard. This standard needs to be endorsed by the medical writers and medical directors (not the data managers and statisticians) who actually develop the study protocol.

Sunday, April 18, 2010

China's Regulations on Drug and Biological Products Registration

In China, drug and biological products are regulated by SFDA (国家食品药品监督管理局). sFDA is a counterpart of US FDA. Its Center for Drug Evaluation (药品审评中心) regulates both drug and biological products (sort of combination of US FDA's CDER and CBER divisions).

Law and Guidance:

Drug Administration Law: Dec 2001

All clinical trials should be pre-approved by sFDA
All clinical trials should be carried out by qualified investigators
Detailed procedures and technical data should be submitted

Regulations for Implementation of the Drug Administration Law: Sep 2002

Good Clinical Practice: Aug 2003 (In Chinese)

Statistical Guidelines for Clinical Trials of Drugs and Biologics: Mar 2005 (in Chinese)

Pharmacokinetics and Bioequivalence: 2005 (化学药物制剂人体生物利用度和生物等小等效性研究技术指导原则)

Toxicology: 2005 (化学药物长期毒性试验技术指导原则)

Hong Kong; GCP for Proprietary Chinese Medicines: Feb 2004 (PDF669KB)

Drug Registration Regulation (药品注册管理办法): Jul 2007 and its appendices

It is interesting that in its appendices, there are requirements for sample size. For a new or an imported drug applications, the sample size should meet the statistical requirement and the minimal cases required. For category I and II (new drugs), the minimal cases required (trial group exposure): 20-30 for Phase I, 100 for Phase II, 300 for Phase III, 2000 for Phase IV. For category III and IV (imported drugs), trials should have at least 100 pairs. In the event of more than one indication, cases for each main indication shall be at least 60 pairs.

Further reading:

Chen, Feng, Chen, Qiguang, Yu, Hao, Chen, Jie, Hsu, Jason (2008). Current statistical requirements for pharmaceutical clinical trials In China. Drug Information Journal 42, pp. 321-330.

Sunday, April 11, 2010

When to Finalize the Statistical Analysis Plan (SAP)?

Recently, a group of statisticians in Linkedin.com (presumabally all working in drug development industry) discussed the following posted questions:
"A client wants me to prepare final SAP shortly after protocol and CRFs are finalized for a Phase 3 trial, to submit to FDA prior to start of study. I find this unusual. Any experience doing so? When?"

There are responses like "I do not see why SAP need to be finalized until it is time to lock the database and unblind"; "Why do you want to wait? What will you learn or gain by waiting?"...

First of all, let's look at the ICH guidance (E9 Statistical Principles for Clinical Trials):

"The statistical analysis plan may be written as a separate document to be completed after finalising the protocol. In this document, a more technical and detailed elaboration of the principal features stated in the protocol may be included. The plan may include detailed procedures for executing the statistical analysis of the primary and secondary variables and other data. The plan should be reviewed and possibly updated as a result of the blind review of the data (see 7.1 for definition) and should be finalised before breaking the blind. Formal records should be kept of when the statistical analysis plan was finalised as well as when the blind was subsequently broken.
If the blind review suggests changes to the principal features stated in the protocol, these should be documented in a protocol amendment. Otherwise, it will suffice to update the statistical analysis plan with the considerations suggested from the blind review. Only results from analyses envisaged in the protocol (including amendments) can be regarded as confirmatory."

This indicated that the ICH principal is followed as long as the statistical analysis plan is finalized or signed off prior to the study unblinding (or database lock if it is open label study). I believe this is the common practice in industry.

There is certaily a trend to push for SAP signoff prior to the study start, especially for late stage trials or for trials with complicated statistical analysis.

For early phase exploratory trials,one of the purpose is to explore the adequate endpoint; control group, study design, sample size, study issues,... for the late confirmatory trials, it is acceptable not to finalize the statistical analysis plan too earlier. If it is phase III, confirmatory trial (or new term A&WC - adequate and well controlled study), it is better to have SAP signoff earlier.

If the study design is complicated or the statistical analysis is complicated (for example using Beyesian approach; using non-inferiority margin; using adaptive design,...), the statistical analysis section in the study protocol may not be sufficient and a detailed statistical analysis plan may have to be sent to FDA at the time of protocol submission.As one of the members from Linkedin commented "The more important a protocol is to the NDA/BLA (i.e., a pivitol trial), the sooner you should get it in front of the FDA for comments."

Another point is that SAP has mainly two parts: the text portion and the mock shells. We may just need to finalize the text portion of the SAP prior to the study start and design the mock up shells after the CRFs, annotations, and sample data are available. In reality, every study protocol contains a section for statistical analysis. The key elements for statistical analysis should be included in this section. If the statistical analysis section is not detailed enough, the expanded statistical analysis section (the text portion of SAP) should detail the things like: prespecified analysis method/statistical model; missing data handling and imputation; prespecified interim analysis plan/method; multiplicity adjustment to p value; justification for non-inferiority margin; detail adaptation method, detail bayesian method, protection of blinding; inclusion of subjects in study population,...).

SAP could become a very lengthy document. here is an example of a SAP with bayesian analysis component from FDA's website.

Safety and Effectiveness of the Alair® System for the Treatment of Asthma: A Multi-center Randomized Clinical Trial (Asthma Intervention Research (AIR2) Trial)

Several weeks ago, I attended DIA/FDA's workshop on "Adaptive design clinical trials - discussion on FDA's draft guidance". FDA has expressed the great concern about the operational biases and the study integrity if the adaptive designs (especially those not not well accepted) are used. FDA's draft guidance on adaptive design has a specific section discussing "Role of the Prospective Statistical Analysis Plan in Adaptive Design Studies"

"The importance of prospective specification of study design and analysis is well recognized for conventional study designs, but it is of even greater importance for many of the types of adaptive designs discussed in sections V and VI, particularly where unblinded interim analyses are planned. As a general practice, it is best that adaptive design studies have a SAP that is developed by the time the protocol is finalized. The SAP should specify all the changes prospectively planned and included in the protocol, describe the statistical methods to implement the adaptations, describe how the analysis of the data from each adaptive stage will be incorporated into the overall study results, and include the justification for the method of control of the Type I error rate and the approach to appropriately estimating treatment effects. The SAP for an adaptive trial is likely to be more detailed and complex than for a non-adaptive trial. Any design or analysis modification proposed after any unblinded interim analysis raises a concern that access to the unblinded data used in the adaptations may have influenced the decision to implement the specific change selected and thereby raises questions about the study integrity. Therefore, such modifications are generally discouraged. Nonetheless, circumstances can occur that call for the SAP to be updated or for some other flexibility for an unanticipated adaptation. The later in the study these changes or updates are made, the more a concern will arise about the revision’s impact. Generally, the justifiable reasons to do so are related to failure of the data to satisfy the statistical assumptions regarding the data (e.g., distribution, proportionality, fit of data to a model). In general, it is best that any SAP updates occur before any unblinded analyses are performed, and that there is unequivocal assurance that the blinding of the personnel determining the modification has not been compromised. A blinded steering committee can make such protocol and SAP changes, as suggested in the ICH E9 guidance and in the DMC guidance, but adaptive designs open the possibility of unintended sharing of unblinded data after the first interim analysis. Any design or analysis modifications made after an unblinded analysis, especially late in the study, may be problematic and should be accompanied by a clear, detailed description of the data firewall between the personnel with access to the unblinded analyses and those personnel making the SAP changes, along with documentation of adherence to these plans. Formal amendments to the protocol and SAP need to be made at the time of such changes (see 1377 21 CFR 312.30)"

Sunday, April 04, 2010

Hockey stick phenomenon

Hockey stick phenomenon or hockey stick curves has been used mostly in describing the climate change. it says that the tempeature variation over centuries are relatively unchanged until after 1900. The temperature rose sharply due to the human activities. Since 1998 Nature article by Mann, Bradley, and Hughes, the hockey stick curve (phenomenon) has stirred quite some debates / contraversies in climate research fields.

Hockey stick curves have also been used in described any change with a normal trend (trajectory), then with a different change or a interruption in the trend. For example, the hockey stick curve may be used to describe the disease progression with gradual progression, then sudden deterioation. In clinical trials, one could observe that patients have initial rebound in the measured parameters (endpoints), then gradually decrease. In clinical trials for Alzheimer disease, the purpose is to prevent the paitent from further deterioation, rather than improvement or cure. If a rebound during the initial phase of the trial, it could be described as 'hockey stick".

During my PhD study, I analyzed the EPA whole effluent toxicity testing data and noticed the non-linear dose response and the phenomenon of 'hormesis' which says that exposure to low or very low dose of toxicants could have benefit effects. The hormesis or low dose response could be described as J-shaped or Hockey stick.

Way before the hockey stick model was used to describe the temperature data, the method was proposed to analyze the data in environmental health data. In 1979, Yanagimoto and Yamamoto published their paper in Environmental Health Perspectives titled "Estimation of Safety Doses: Critical Review of Hockey Stick Regression Method".

From data analysis standpoint, if the data presents with hockey stick phenomenon, the typical linear regression can not be used. Hockey stick regression can be considered as segmented linear regression with just one knot. In a paper by Simpson et al "excess risk thresholds in ultrasound safety studies: statistical method for data on occurrence an dsize of lesions", they used a piecewise linear model. A link from UGA had some discussions about using SAS procedures to model the data with hockey stick:

One thing for sure is that hockey stick could always be contraversial. Additional data may be needed to verify if the hockey stick phenomenon is true or is the data issue or is the data collection issue.

Tuesday, March 30, 2010

Health Care Globalization and Patients Without Borders

The term 'globalization' is nothing new. According to Wikipedia, "globalization describes an ongoing process by which regional economies, societies, and cultures have become integrated through a globe-spanning network of communication and trade. The term is sometimes used to refer specifically to economic globalization: the integration of national economies into the international economy through trade, foreign direct investment, capital flows, migration, and the spread of technology." Globalization certainly has its benefits, but it has its victims too, and the results can be deadly. As the global economy knits countries closer together, it becomes easier for diseases to spread through states, over borders and across oceans.

Globalization has impact on the medicines we take (many of them are manufactured outside of a specific country) and the conduct of the clinical trials (the clinical trial data are cross borders from multiple nations). Last year, when I attended the FDA/Industry Statistical Workshop, the theme of the workshop is 'global harmonization' - another way to say 'globalization'.

Recently I attended a conference in Duke, the focus again was 'globalization' with emphasis on Asia. One session discussed the tourism medicine and 'patients without borders'. It will be trend that with globalization, patients can cross border to choose the health care that will better service them (with cost and quality of care in mind). One day, we could share the health care resources much like the sharing of the technologies.

I also understand that the sharing of the health care resource will not be an easy task. Several days ago, one of my American colleagues asked me if it is possible for foreigners to have renal (kidney) transplantation in China (for obvious reason of the shortage in kidney donors). When I posted the question to my alumni email list, I immediately got some response such as the one below "I believe that all Chinese with renal failure have the absolute right for having kidney transplant in China. As a Chinese, I strongly against any give away of basic human right..." I sort of agree with this. The world is not ready to share the health care resource (at least the organs for transplant).

Within the country, there may or may not be any policy or procedure to ensure the fairness between the rich and the poor, not to mention the fairness across countries.

China actually has its policies on organ transplantation including renal
(kidney) transplantation.The policies basically prohibit the tourism medical treatment in China for organ transplantation.

卫生部办公厅关于境外人员申请人体器官移植有关问题的通知 (General Office of the Ministry of Health personnel for human organ transplants outside the Issues)

人体器官移植条例 (Human Organ Transplant Ordinance)