Tuesday, November 24, 2009

Box-Cox Transformaton

In statistical / biostatistical analysis, it is pretty common to apply the data transformation technique. The reason is to achieve the normality assumption. data transformation refers to the application of a deterministic mathematical function to each point in a data set — that is, each data point zi is replaced with the transformed value yi = f(zi), where f is a function.

The typical data transformations include logarithm, square root, Arcsine transformation. Log transformation is suitable for variables with log-normal distributions. The square-root transformation is commonly used when the variable is a count of something. For arcsin transformation, the numbers to be transformed must be in the range −1 to 1. This is commonly used for proportions, which range from 0 to 1.

Another popular data transformation technique Box-Cox transformation, which we may not use frequently in clinical trials. Box-Cox transformation belongs to the so-called 'power transform'. The Box-Cox family of transformations has two useful features: first, it includes linear and logarithmic transformations as special cases; and, second, it possesses strong scale equivariance properties, including the property that the transformation parameter is unaffected by the rescaling. Application of the Box-Cox transformation algorithm reduced the heterogeneity of error and permitted the assumption of equal variance to be met. Its main disadvantage is that both the domain and the range of the transformation are, in general, bounded.

Box-Cox transformation can be easily implemented with SAS Proc Transreg.

Further readings:

No comments: