Tuesday, November 24, 2009

Box-Cox Transformaton

In statistical / biostatistical analysis, it is pretty common to apply the data transformation technique. The reason is to achieve the normality assumption. data transformation refers to the application of a deterministic mathematical function to each point in a data set — that is, each data point zi is replaced with the transformed value yi = f(zi), where f is a function.

The typical data transformations include logarithm, square root, Arcsine transformation. Log transformation is suitable for variables with log-normal distributions. The square-root transformation is commonly used when the variable is a count of something. For arcsin transformation, the numbers to be transformed must be in the range −1 to 1. This is commonly used for proportions, which range from 0 to 1.

Another popular data transformation technique Box-Cox transformation, which we may not use frequently in clinical trials. Box-Cox transformation belongs to the so-called 'power transform'. The Box-Cox family of transformations has two useful features: first, it includes linear and logarithmic transformations as special cases; and, second, it possesses strong scale equivariance properties, including the property that the transformation parameter is unaffected by the rescaling. Application of the Box-Cox transformation algorithm reduced the heterogeneity of error and permitted the assumption of equal variance to be met. Its main disadvantage is that both the domain and the range of the transformation are, in general, bounded.

Box-Cox transformation can be easily implemented with SAS Proc Transreg.

Further readings:

Wednesday, November 18, 2009

Dealing with the paired data

Paired data contains values which fall normally into pairs and can therefore be expected to vary more between pairs than within pairs. The pairing is to reduce the variability. After the pairing, The between-subject variability will be eliminated. If pairing is effective it will reduce variability enough to justify the effort involved to obtain paired data.

There are many practical examples of paring. In clinical trial, crossover design is a special case of the pairing where the same subject receive more than one treatment. If all subjects receive treatment A, then treatment B, it can still be called crossover design (single sequence cross over design). In Epidemiology field, the case-control study is typically paring. There are terms 1:1 matched case-control, and 1:m matched case-control. In education, we can do the paring to compare the scores before and after the training;......

When outcome measures are continuous variable (such as drug concentration), without considering the covariates, analysis of paired data can be implemented by using paired t-test which can be easily performed using SAS PROC UNIVARIATE (calculate the difference for each pair, then run PROC UNIVARIATE) or SAS PROC TTEST (without calculating the difference first). Suppose x1 and x2 are paired variables,
proc ttest;
paired x1*x2;
run;
If the normality assumption is questionable, the non-parametric tests (sign test and Wilcoxon signed rank sum test) can be used. UCLA's Statistical Consulting Services web site provided examples for these tests.

In more complicated situation (such as crossover design) or if we have to do the modeling to include the covariates, mixed model needs to be used. SAS PROC MIXED can implement the mixed model easily. See SAS/Stat User's Manual for PROC MIXED. In a research paper titled "Detection of emphysema progression in alpha 1-antitrypsin deficiency using CT densitometry; Methodological advances", I actually dealt with the paired data using so called 'random coefficient model'.


When outcome variable is discrete data, the easiest example is McNemar test. McNemar's test is performed if we are interested in the marginal frequencies of two binary outcomes. These binary outcomes may be the same outcome variable on matched pairs (like a case-control study) or two outcome variables from a single group.

In more complicated situation or if the covarites need to be included in the model, 'conditional logistic regression' needs to be employed. 'Conditional logistical regression' can be implemented using SAS Proc Logistic or SAS Proc PHREG. See following links for detail descriptions.


Sunday, November 08, 2009

Pedistric use and geriatric use of drug and biological products

In the United States, every marketed drug or biological product needs to have its product label or package insert. The product label contains the use in special populations including pediatric and geriatric population. Here is a paragraph from FDA guidance on "Labeling for Human Prescription Drug and Biological Products — Implementing the New Content and Format Requirements"

Use in Specific Populations (§ 201.57(a)(13))
Information under the Use in Specific Populations heading includes a concise summary
of any clinically important differences in response or recommendations for use of the
drug in specific populations (e.g., differences between adult and pediatric responses, need
for specific monitoring in patients with hepatic impairment, need for dosing adjustments
in patients with renal impairment). Typically, information under this heading includes
limitations or precautions for specific populations or established differences in response.


Absence of the clinical study data in pediatric and geriatric population could sometimes cause problems in product label or in the drug approval process. During the drug development process, it is prudent to consider the inclusion/exclusion of patient population in terms of the age limit. In the study protocol, the inclusion criteria pertinent to the age limits (upper and lower limits) should be carefully considered. In the statistical analysis, when data for pediatric and/or geriatric population is available, subgroup analysis should always be performed.

In regulatory environment, the classification of the pediatric and geriatric population are defined as:

Pediatric population: according to ICH guidance E11 "Clinical Investigation of Medicinal Products in the Pediatric Population", the pediatric population contains several sub-cateogires:
  • preterm newborn infants
  • term newborn infants (0 to 27 days)
  • infants and toddlers (28 days to 23 months)
  • children (2 to 11 years)
  • adolescents (12 to 16-18 years (dependent on region))
Notice that in FDA's guidance "General Considerations for Pediatric Pharmacokinetic Studies
for Drugs and Biological Products
", the age classification is a little bit different. I am assuming that the ICH guidance E11 should be the correct reference.
Geriatric population:
Geriatric population is defined as persons 65 years of age and older. There is no upper limit of age defined. The Food and Drug Administration has regulations governing the content and format of labelling for human prescription drug products, including biological products, to include information pertinent to the appropriate use of drugs in the elderly and to facilitate access to this information by establishing a “Geriatric use” subsection in the labelling.

Further readings: