Wednesday, December 31, 2008

Significance of the Correlation Coefficient

People can be confused about the interpretation of the correlation coefficient, especially when we observe a small, but statistically significant correlation coefficient. The following paragraphs are from "http://janda.org/c10/Lectures/topic06/L24-significanceR.htm", which explain nicely about the interpretation of the correlation coefficient. In addition, the Wikipedia provides a good introduction about correlation and it also contains a small table to categorize the size (or strength) of the correlation.

Test for the significance of relationships between two CONTINUOUS variables

  • We introduced Pearson correlation as a measure of the STRENGTH of a relationship between two variables
  • But any relationship should be assessed for its SIGNIFICANCE as well as its strength.

A general discussion of significance tests for relationships between two continuous variables.

  • Factors in relationships between two variables

The strength of the relationship: is indicated by the correlation coefficient: r
but is actually measured by the coefficient of determination: r^2

  • The significance of the relationship
    is expressed in probability levels: p (e.g., significant at p =.05)
    This tells how unlikely a given correlation coefficient, r, will occur given no relationship in the population
    NOTE! NOTE! NOTE! The smaller the p-level, the more significant the relationship
    BUT! BUT! BUT! The larger the correlation, the stronger the relationship

  • Consider the classical model for testing significance
    It assumes that you have a sample of cases from a population.
    The question is whether your observed statistic for the sample is likely to be observed given some assumption of the corresponding population parameter.
    If your observed statistic does not exactly match the population parameter, perhaps the difference is due to sampling error.
    The fundamental question: is the difference between what you observe and what you expect given the assumption of the population large enough to be significant -- to reject the assumption?
    The greater the difference -- the more the sample statistic deviates from the population parameter -- the more significant it is.
    That is, the lessl ikely (small probability values) that the population assumption is true.

  • The classical model makes some assumptions about the population parameter:
    Population parameters are expressed as Greek letters, while corresponding sample statistics are expressed in lower-case Roman letters:
    r = correlation between two variables in the sample
    (rho) = correlation between the same two variables in the population
    A common assumption is that there is NO relationship between X and Y in the population: r = 0.0
    Under this common null hypothesis in correlational analysis: r = 0.0
    Testing for the significance of the correlation coefficient, r
    When the test is against the null hypothesis: r_xy = 0.0
    What is the likelihood of drawing a sample with r_xy ­ 0.0?
    The sampling distribution of r is
    approximately normal (but bounded at -1.0 and +1.0) when N is large
    and distributes t when N is small.
    The simplest formula for computing the appropriate t value to test significance of a correlation coefficient employs the t distribution:

t=r*sqrt((n-2)/(1-r^2))

The degrees of freedom for entering the t-distribution is N - 2

  • Example: Suppose you obsserve that r= .50 between literacy rate and political stability in 10 nations
    Is this relationship "strong"?
    Coefficient of determination = r-squared = .25
    Means that 25% of variance in political stability is "explained" by literacy rate
    Is the relationship "significant"?
    That remains to be determined using the formula above
    r = .50 and N=10
    set level of significance (assume .05)
    determine one-or two-tailed test (aim for one-tailed)

t=r*sqrt((n-2)/(1-r^2))=0.5*sqrt((10-2)/(1-.25)) = 1.63
For 8 df and one-tailed test, critical value of t = 1.86
We observe only t = 1.63
It lies below the critical t of 1.86
So the null hypothesis of no relationship in the population (r = 0) cannot be rejected

  • Comments
    Note that a relationship can be strong and yet not significant
    Conversely, a relationship can be weak but significant
    The key factor is the size of the sample.
    For small samples, it is easy to produce a strong correlation by chance and one must pay attention to signficance to keep from jumping to conclusions: i.e.,
    rejecting a true null hypothesis,
    which meansmaking a Type I error.
    For large samples, it is easy to achieve significance, and one must pay attention to the strength of the correlation to determine if the relationship explains very much.


  • Alternative ways of testing significance of r against the null hypothesis
    Look up the values in a table
    Read them off the SPSS output:
    check to see whether SPSS is making a one-tailed test
    or a two-tailed test
  • Testing the significance of r when r is NOT assumed to be 0
    This is a more complex procedure, which is discussed briefly in the Kirk reading
    The test requires first transforming the sample r to a new value, Z'.
    This test is seldom used.
    You will not be responsible for it.

LogMAR in Ophthalmology trials

Vision is typically reported as xxx/yyy where the xxx value is usually 20 for US assessments. As vision gets worse, for the same numerator, the denominator increases.

logMAR is log10(denominator/numerator) or -log10(numerator/denominator)
"normal" vision is 20/20, or logMAR = 0
20/100 is worse than 20/20 and logMAR = 0.69897
So the logMAR increases as vision gets worse and decreases as vision gets better
if you are doing change = visit - baseline, a negative change would be improvement in vision
a positive change would be worsening in vision.

Another interpretation of change in logMar values is to take the antilog of the change in logMAR values - this would be the "number of lines" in which vision changed. FDA often applies a 3 lines of change (ETDRS chart) criteria as this change is a doubling of the visual angle.

A useful reference on calculating average visual acuity and the whole logMAR concept is the article by Jack Holladay "Proper Method for Calculating Average Visual Acuity".

It is also useful to refer to FDA Guidelines for Multifocal Intraocular Lens IDE Studies and PMAs and Guidance for Industry Guidance for Premarket Submissions of Orthokeratology Rigid Gas Permeable Contact Lenses.

AstraZeneca considers pursuing "biosimilars."

From "http://www.delawareonline.com/article/20081230/BUSINESS/812300331"

AstraZeneca is considering joining several of its peers in pursuing "biosimilars" -- generic versions of high-priced biotechnology drugs.

The London-based drug maker has made a push into the $94 billion market for biologics in recent years with the acquisition of Cambridge Antibody Technologies in 2006 and last year's $15.6 billion purchase of Maryland-based MedImmune.
Generic versions of biologics -- drugs made from living cells rather than chemicals -- are not yet approved for sale in the United States.
The complexity of dealing with the larger biological molecules makes it impossible to create an exact copy of a biologic drug, prompting concerns that the biosimilar medicine may end up working differently than the original drug.
But amid the growing popularity and high price tags of many biologics, Congress is expected to consider a regulatory pathway next year to bring biosimilars to market. President-elect Barack Obama has said he supports biosimilars.
Several large drug makers, threatened by patent expirations on top-selling products, are looking at biosimilars as a potential source of revenue. Merck said earlier this month it would start a new unit to copy biologics, and Eli Lilly has also expressed interest in the market.
In an interview published last week by the Financial Times, AstraZeneca CEO David Brennan said the company was studying the launch of biosimilar products, although he said such a move would depend on the legislation being considered by Congress.
AstraZeneca, whose U.S. headquarters is in Fairfax, said in a statement that MedImmune has facilities well-equipped to produce biosimilars, "should we choose to do so and if the legal and regulatory framework allowed.
"However, at the current time, we see the strongest opportunities for the business in flexing its track record of innovation, developing its pipeline of potential biologic candidates to treat or prevent a number of debilitating or life-threatening diseases," the company said.
U.S. and European regulators have a streamlined approval process for generic versions of conventional small-molecule drugs, which are easier to copy than biologics. The European Union has an approval procedure for certain biologics.
Novartis AG's generic-drug unit two years ago became the first company to have a biosimilar product approved: the growth hormone Omnitrope.
The European Commission last August cleared Novartis' anemia drug that is similar to Johnson & Johnson's Eprex and Amgen's Epogen.