Saturday, May 30, 2009

Pharmacokinetics: Verify the Steady State Under Multiple Doses

For a multiple-dose regimen, the amount of drug in the body is said to have reached a steady state level if the amount or average concentration of the drug in the body remains stable. At steady state, the rate of elimination = the rate of administration.

To determine whether the steady state is achieved, statistical test can be performed on the trough levels. The predose blood sampling should include at least three successive trough level samples (Cmin).

In FDA's guidance for industry: Bioequivalence Guidance, it stated " determine a steady state concentration, the Cmin values should be regressed over time and the resultant slope should be tested for its difference from zero." For example, we can use the logarithm of last three trough measurements to regress over time. If the 90% CI for the exponential of slope for time is within (0.9, 1.1), then we will claim SS. The limit of (0.9, 1.1) is arbitrarily decided.

Similarly, in FDA's guidance for Industry: Clozapine Tablets: In Vivo Bioequivalence and In Vitro Dissolution Testing, it stated "...The trough concentration data should also be analyzed statistically to verify that steady-state was achieved prior to Period 1 and Period 2 pharmacokinetic sampling."

Typically, the verification of the steady state can simply be the review of the trough levels at time points prior to the PK sampling without formal statistical testing. If the PK blood samples are taken after 4-5 dose intervals, it can be roughly assumed that the (approximately or near) steady state has been reached.
The trough and peak values of plasma concentrations are also used to determine whether the steady state has been reached. The peak to trough ratio is usually used as an indicator of fluctuation of drug efficacy and safety. A relatively small peak to trough ratio indicates that the study drug is relatively effective and safe.

In their book "Design and analysis of bioavailability and bioequivalence studies", Chow and Liu described the univariate analysis and multivariate anaysis approaches to test the steady state formally.

Hong also proposed a non-linear procedure to test for steady state.

A note about trough and Cmin:

The characteristic Cmin has been associated with the concentration at the end of te dosing interval, the so-called pre-dose or trough value. However, for prolonged release formulations which exhibit an apparent lag-time of absorption, the true minimum (trough) concentration may be observed some time after the next dosing, but not necessarily at the end of the previous dosing interval.

Saturday, May 23, 2009

Statistical validation of the surrogate endpoints

A surrogate endpoint is intended to substitute for a clinical endpoint. A surrogate endpoint is expected to predict clinical benefit (or harm, or lack of benefit) based on epidemiologic, therapeutic, pathophysiologic or other scientific evidence. In clinical trials, a surrogate endpoint (or marker) is a measure of effect of a certain treatment that may correlate with a real endpoint but doesn't necessarily have a guaranteed relationship. The National Institutes of Health (USA) define surrogate endpoint as "a biomarker intended to substitute for a clinical endpoint"

Biomarkers are biological substances or features that can be used to indicate normal biological processes, disease processes, or responses to therapy. Biomarkers can be physiological indicators, such as heart rate or blood pressure, or they can be molecules in the tissues, blood, or other body fluids. For example, elevated blood levels of a protein called prostate specific antigen is a molecular biomarker for prostate cancer.

Biomarker and surrogate endpoint are often used interchangeably. However, there a subtle difference. Surrogate endpoints may not just be biomarkers and could include the imaging measurements (such as CT bone/lung densitometry, arteriogram...).

Just recently, I noticed that there are quite some works done in the area of statistical validadtion for surrogate endpoints. In the medical community, people may simply think that a biomarker can be a surrogate endpoint if the correlation between a surrogate endpoint and an established clinical endpoint are observed. However, the correlation is only one of the criteria (or requirement) for a biomarker to be a valid surrogate endpoint. To validate a surrogate endpoint, there have been a lot of discussions about the statistical approach in validating the surrogate endpoint.

in their paper titled "Surrogate end points in clinical trials: are we being misled?" (1996), Fleming and DeMets provided many examples of the surrogate endpoints and pointed out that these surrogate endpoints often fail in formal statistical validation.

The issues with surrogate endpoint is actually discussed in ICH E9 Statistical Principles for Clinical Trials

Surrogate Variables (2.2.6)
When direct assessment of the clinical benefit to the subject through observing
actual clinical efficacy is not practical, indirect criteria (surrogate variables — see
Glossary) may be considered. Commonly accepted surrogate variables are used in
a number of indications where they are believed to be reliable predictors of
clinical benefit. There are two principal concerns with the introduction of any
proposed surrogate variable. First, it may not be a true predictor of the clinical
outcome of interest. For example, it may measure treatment activity associated
with one specific pharmacological mechanism, but may not provide full information
on the range of actions and ultimate effects of the treatment, whether positive or
negative. There have been many instances where treatments showing a highly
positive effect on a proposed surrogate have ultimately been shown to be
detrimental to the subjects' clinical outcome; conversely, there are cases of
treatments conferring clinical benefit without measurable impact on proposed
surrogates. Second, proposed surrogate variables may not yield a quantitative
measure of clinical benefit that can be weighed directly against adverse effects.
Statistical criteria for validating surrogate variables have been proposed but the
experience with their use is relatively limited. In practice, the strength of the
evidence for surrogacy depends upon (i) the biological plausibility of the
relationship, (ii) the demonstration in epidemiological studies of the prognostic
value of the surrogate for the clinical outcome, and (iii) evidence from clinical
trials that treatment effects on the surrogate correspond to effects on the clinical
outcome. Relationships between clinical and surrogate variables for one product
do not necessarily apply to a product with a different mode of action for treating the
same disease.

Some key references:
1. Prentice, R. L. (1989). Surrogate endpoints in clinical trials: Definition and operational criteria. Statistics in Medicine 8 431–440
2. Freedman L, Graubard B (1992). Statistical validation of intermediate endpoints for chronic
diseases. Statistics in Medicine
3. Lin DY, Fleming TR, DeGruttola V. (1997) Estimating the proportion of treatment effect explained by a surrogate endpoint. Statistics in Medicine, 16:1515-1527
4. A framework for biomarker and surrogate endpoint in drug development by Janet Woodcock
5. Surrogate Markers - Their Role in Regulatory Decision Process
6. Statistical Validadtion of surrogate markers
7. Fleming and DeMets (1996) Surrogate End Points in Clinical Trials: Are We Being Misled? Ann Mem Med. 1996; 125:605-613

Friday, May 15, 2009

Imaging analysis in clinical trial

Medical imaging has now been a critical part in clinical trials. It can be used in many aspects of clinical trial process:
1) Disease diagnosis as part of inclusion/exclusion criteria
2) Safety assessment
3) Clinical efficacy endpoint

There are many medical imaging technologies. Here are just a list of some:
1) x-ray
2) CT scan
3) MRI
4) PET scan
5) Ultrasound
6) arteriogram or angiography
7) venogram

There is benefit and risk in using the medical imaging in clinical trials. Some imaging can pose extra safety issues. For example, x-ray, CT scan, PET scan can put the study subjects at extra radiological exposure. Arteriogram and CT/A can expose the subjects to additional contrast medium or dyes which may have its own safety issue.

Medical imaging is always a surrogate endpoint. The technician plays the important role in obtaining the imaging. The standardization and calibration are always important in order to obtain the reliable data especially in longitudinal studies. The interpretation of the imaging results depend on who read the imaging. There could be substantial variation between different readers. Therefore, the central reading is very important if the medical imaging is used in clinical trial. There are quite some articles discussing the imaging in clinical trials in the Applied Clinical Trial magazine.

There are several specialty medical imaging vendors on the market. Some of them are listed below:
1) BioClinica or Bio-imaging
2) Biomedical Systems
3) Perceptive (part of Parexel)
4) Synarc

FDA and EMEA has issued several guidance on imaging used in clinical trial. For example:

1) FDA Guidance "Standards for Clinical Trials Imaging Endpoints"
This guidance discussed the clinical trials with imaging endpoints - i.e., the reading from medical imaging is used as the efficacy endpoint. Examples are: RECIST criteria for assessing the tumor size for solid tumor based on FDG-PET or MRI. Lung density measure by CT scan to assess emphysema.

2) FDA guidance Developing Imaging Drug and Biological Products,

Sunday, May 03, 2009

Adjustment for multiplicity

one of the issues in statistics field is the adjustment for multiplicity - adjustment of alpha level for multiple tests. The multiplicity can arise in many different situations in clinical trials; some of them are listed below:
  • Multiple arms
  • Co-primary endpoints
  • Multiple statistical approaches for the same endpoint
  • Interim analysis
  • More than one doses vs. Placebo
  • Meta analysis
  • Sub group analysis

There are tons of articles about the multiplicity, but there are few guidances from the regulatory bodies. While the multiplicity issues arise, the common understanding is that the adjustment needs to be made. However, there is no guidance on which approach should be used. The adjustment approach could be the very conservative approach (e.g., Bonferroni) or less conservative (e.g., Hochberg). One could evaluate the various approaches and determine which adjusmtent approach is best suited to the situation in study.

While we are still waiting for FDA's guidance on multiplicity issue (hopefully it will come out in 2009), EMEA has issued a PtC (point to consider) document on multiplicity. The document provide guidances on when an adjustment for multiplicity should be implemented.

While there are so many articles related to multiplicity, I find the following articles suitable for my taste and with practical discussions.

  • Proschan and Waclawiw (2000) Practical Guidelines for Multiplicity Adjustment in Clinical Trials. Controlled Clinical Trial
  • Capizzi and Zhang (1996) Testing the Hypothesis that Matters for Multiple Primary Endpoints. Drug Information Journal
  • Koch and Gansky (1996) Statistical Considerations for Multiplicity in Confirmatory Protocols. Drug information Journal
  • Wright (1992) Adjust p values for simutaneous inference. Biometrics

It is always useful to refer to the statistical review documents for previous NDA/BLA to see which kind of approaches have been used in drug approval process. Three approaches below seem to stand out. These three approaches are also mentioned in

  • Hochberg procedure
  • Bonferroni-Holm procedure
  • Hierarchical order for testing null hypotheses

while not exactly the same, In a CDRH guidance on

"Clinical Investigations of Devices Indicated for the Treatment of Urinary Incontinence ", it states “The primary statistical challenge in supporting the indication for use or device performance in the labeling is in making multiple assessments of the secondary endpoint data without increasing the type 1 error rate above an acceptable level (typically 5%). There are many valid multiplicity adjustment strategies available for use to maintain the type 1 error rate at or below the specified level, three of which are listed below:
· Bonferroni procedure;
· Hierarchical closed test procedure; and
· Holm’s step-down procedure. "

Hochberg procedure is based on Hochberg's paper in 1988. It has been used in several NDA/BLA submissions. For example, in Tysabri BLA, it is stated

"Hochberg procedure for multiple comparisons was used for the evaluation of the primary endpoints. For 2 endpoints, the Hochberg procedure results in the following rule: if the maximum of the 2 p-values is less than 0.05, then both hypotheses are rejected and claim the statistical significance for both endpoints. Otherwise, if the minimum of the 2 p-values needs to be less than 0.025 for claiming the statistical significance".

Bonferroni-Holm procedure is based on Holm's paper in 1979 (Holm, S (1979): "A simple sequentially rejective multiple test procedure", Scandinavian Journal of Statistics, 6:65–70). It is a modification to the original method. This method may also be called Holm-Bonferroni approach or Bonferroni-Holm correction. This approach was employed in Flomax NDA (020579). and BLA for HFM-582 (STN 125057).

Both Holm's procedure and Hochberg's procedure are the modifications from the Bonferroni procedure. Holm's procedure is called 'step-down procedure' and Hochberg's procedure is called 'step-up procedure'. An article by Huang and Hsu titled "Hochberg's step-up method: cutting corners off Holm's step-down method" (Biometrika 2007 94(4):965-975) provided a good comparison of these two procedures.

Benjamin-Hochberg also proposed a new procedure which controls the FDR (false discovery rate) instead of controling the overall alpha level. The original paper by Benjamin and Hochberg is titled "controlling the false discovery rate: a practical and powerful approach to multiple testing" appeared in Journal of the Royal Statistical Society. it is interesting that the FDR and Benjamin-Hochberg procedure has been pretty often used in the gene identification/microarray area. A nice comparison of Bonferroni-Holm approach and Benjamin-Hochberg approach is from this website. Another good summary is the slides from 2004 FDA/Industry statistics worshop.

Hierarchical order for testing null hypotheses was cited in EMEA's guidance as

"Two or more primary variables ranked according to clinical relevance. No formal adjustment is necessary. Howeveer, no confirmatory claims can be based on variables that have a rank lower than or equal to that variable whose null hypothesis was the first that could not be rejected. "

This approach can be explained as a situation where a primary endpoint and several other secondary endpoints are defined. The highest ranked hypothesis is similar to the primary endpoint and the lower ranked endpoints are similar to the secondary endpoints.

In one of my old studies, we hypothsized the comparisons as something like below:

"A closed test procedure with the following sort order will be used for the pairwise comparisons. The second hypothesis will be tested only if the first hypothesis has been rejected, thus maintaining the overall significance level at 5%.
1. The contrast between drug 400mg and placebo (two-sided, alpha = 0.05)(H01 : mu of 400 mg = mu of placebo)
2. The contrast between drug 400 mg and a comparator (two-sided, alpha = 0.05)(H02 : mu of 400 mg = mu of the comparator) "

Friday, May 01, 2009

Understanding person-year or patient-year

When I studied the public health many years ago, in occupational health class, the term 'person year' was pretty often used. Since the length of exposure to the health hazard is different for different workers, it is necessary to calculate the person year. The total person year (summation of person year from all workers exposed to certain industry hazard) will then be used to calculate the rate (such as death rate, mortality rate,...). When same logic is used in the clinical setting or in clinical trial field, the similar term 'patient year' is used. The terms 'person year' and 'patient year' are used interchangeably.

The rates are represented as “per person-time” to provide more accurate comparisons among groups when follow-up time (i.e., patient exposure time) is not the same in all groups. Theoretically, we can express a rate of events per patient year, but the rate would be typically be a fraction or too small. In practice, the rate can be expressed as per 100, 1000, 100,000, 1 million patient-years or patient-years at risk.

“Patient-year at risk” means that the denominator of the rate calculation is ascertained by adding exposure times of all patients, where each patient’s exposure time is defined as days spent in a pre-determined time period (i.e., a year), censored only by events such as death or disenrollment, or the end of the time period. Divide the total number of days by 365 or 365.25 to get the actual year value.

“Patient-year” means that the denominator of the rate calculation is ascertained by counting all patients who are in the pre-determined time period for at least one day.

The expressions “per 100,000 patient-years at risk” and “per million patient-years” are just different ways of normalizing the rates to better present them. Thus, a hospitalization rate of 0.0000015 per patient-year, can also be expressed as 1.5 per million patient-years. provided pretty detail explanation about the person-time (person year is just a special case of the person-time). An example of calculating death rate using patient year is illustrated from Organ Donor website.

The rate expressed in 'patient year' has been used in many different scenarios. For example, the following paragraph from a website have used 'The number of exacerbations per patient year'; 'the number of exacerbation days per patient year',...

"Additionally, tiotropium significantly reduced the number of exacerbations (0.853 vs 1.051 exacerbations per patient-year; p=0.003) (1) and number of exacerbation days (mean: 12.61 vs 15.96 days per patient year; p is less than 0.001). Similarly, tiotropium significantly reduced the frequency of exacerbation related hospitalizations (0.177 vs 0.253 means hospitalizations per patient year, p=0.013)(1) and the number of hospitalization days (1.433 vs. 1.702, mean days per patient year, p=0.001) compared to placebo. In addition, a reduction in the number of treatment days (antibiotic or steroids) (p is less than 0.001) and unscheduled visits to health care providers for exacerbations (p = 0.017) were also significantly reduced with tiotropium compared to placebo."

In FDA guidance "Efficacy, Safety, and Pharmacokinetic Studies to Support Marketing of Immune Globulin Intravenous (Human) as Replacement Therapy for Primary Humoral Immunodeficiency", the rate of SBI (serious bacterial infection) is per person-year.

"The protocol should prospectively define the study analyses. We expect that the data analyses presented in the BLA will be consistent with the analytical plan submitted to the IND. Based on our examination of historical data, we believe that a statistical demonstration of a serious infection rate per person-year less than 1.0 is adequate to provide substantial evidence of efficacy. You may test the null hypothesis that the serious infection rate is greater than or equal to 1.0 per person-year at the 0.01 level of significance or, equivalently, the upper one-sided 99% confidence limit would be less than 1.0. "
"We recommend that you provide in the BLA descriptive statistics for the number of serious infection episodes per person-year during the period of study observation."