Tuesday, July 21, 2009

Some axiom quotes about statistics

I had a chance to listen to a seminar by Richard De Veaux in an event organized by SAS JMP for data mining. Dick is really an excellent speaker/tutor. He can make the complex / boring statistical issues sound easy and interesting. One thing I noticed is that he used some interesting axiom quotes and catoons about the statistics. Here are some quotes he used in his talk.

George Box:
  • “All models are wrong, but some are useful”
  • “Statisticians, like artists, have the bad habit of falling in love with their models”.

Twyman’s Law and Corollaries
  • “If it looks interesting, it must be wrong”
  • De Veaux’s Corollary 1 to Twyman’s Law: “If it’s perfect, it’s wrong”
  • De Veaux’s Corollary 2 to Twyman’s Law: “If it isn’t wrong, you probably knew it already
Albert Einstein:
"All models should be as simple as possible but no simpler than necessary"
......

By simply googling the website, I can find some additional quotes/axioms:
David Hand:
  • “Data mining is the discovery of interesting, unexpected, or valuable structures in large datasets”
Twyman's Law:
  • "If it’s interesting or unusual it’s probably wrong"
Geroge Box:
  • “All models are wrong, some are useful”
Unknown source:
  • “If we torture the data long enough, they will confess”
  • "What’s the difference between a biostatistician and a physician? A physician makes an analysis of a complex illness whereas a biostatistician makes you ill with a complex analysis."
Further about Twyman's law:
Twyman's Law (created by Tony Twyman the expert UK based media analyst) states that "If a thing surprises you, it's wrong" Has anyone investigated to what extent the reported loss is real or a research arifact?

There is a rule in market research called Twyman’s law: “anything surprising or interesting is probably wrong”. While not going that far, one should be always advised that if you find a poll result that seems somewhat counter-intuitive, that seems to have no obvious explanation, treat it with caution until other polls support the findings. Statistically there is no more reason for this poll to be wrong than the last poll or the poll before that, and we may indeed find that this is a genuine trend and everyone starts showing the Tories down, but it is a bit odd.


Jim's favorite quotes
Statistics Catoons
Imaging results about catoon statistics from google
Use humor to teach statistics
Collection of statistics joke and humor

Thursday, July 16, 2009

Dose Escalation and Modified Fabonacci Series

Dose-escalation is a type of clinical trial design in which the amount of the drug is increased with each cohort that is added. Each cohort is called 'dose cohort' and the size of each cohort could be different depending on the nature of the study. Typically the cohort size is around 10 subjects. Dose-escalation study design is used to determine how a drug is tolerated in people and it is often used in first-in-men trial. In dose escalation study, a new cohort should not be initiated before safety data in the current or previous cohort has been fully assessed. Sometimes, it may be useful to pre-define a safety stopping rule to prevent the increase of the dose cohort if something bad happens.

One thing for a dose escalation study is how to determine the dose space (ie, how much increase in terms of the dose comparing with the previous cohort). One schema to determine the dose space is so called 'Modified Fabonacci Series'.

Leonardo FibonacciIn the 12th century, Leonardo Fibonacci discovered a simple numerical series that is the foundation for an incredible mathematical relationship behind phi.

Starting with 0 and 1, each new number in the series is simply the sum of the two before it.

0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, . . .

The ratios of the successive numbers in the Fibonacci series quickly converge on Phi. After the 40th number in the series, the ratio is accurate to 15 decimal places.

1.618033988749895 . . .

A “modified” Fibonacci series uses starting numbers other than 0 and 1. For example, the starting number is 1 and 3, we will have a modified Fibonacci series as following:

1, 3, 4, 7, 11, 18, 29, 47, 76, 123, 199, 322, 521, 843, 1364, 2207, 3571, 5778, 9349...

The modified Fibonacci series has been used in Phase I dose escalation study to determine the dose space.

Assuming the d1 is the starting dose for the first cohort, according to the modified Fabonacci series, the next dose cohort will be
d2=2d1, and then d3=1.67d2, d4=1.5d3,... If the start dose is 5 mg and a study with 5 cohorts, the dose schema will be:

Cohort 1 (5 mg) -> Cohort 2 (10 mg) -> Cohort 3 (15 mg) -> Cohort 4 (25 mg) -> Cohort 5 (40 mg)

As we can see, with the dose increase, the ratio between two consecutive doses are getting smaller and smaller.


Further reading on the modified Fabonacci series and its application in dose escalation studies:



Saturday, July 11, 2009

Odds Ratio and Relative Risk


Odds Ratio (OR) and Relative Risk (RR) are two ratios often used in the epidemiology studies and the clinical trials. They are related, but the calculation and the interpretation are quite different. Notice that the Relative Risk (RR) may also be called Risk Ratio with the same abbreviation of RR.

Relative Risk

P0=The probability of events (eg., responder, cardiovascular event) when a covaraite is at a give level (eg, treatment = Placebo, gender=female)
P1=The probability of events (eg., responder, cardiovascular event) when a covariate is at one unit higher than the previous level (eg., treatment = New Drug) or simply another level (gender=male)
RR = P1 / P0

Odds Ratio
Odds = The probability of events / the probablity of non-events
OR = Odds in one group / odds in another group = P1/((1-P1) divided by P0/(1-P0)
For example, OR = Odds in treated group / Odds in Placebo group

When P1 and P0 are small, OR can be used to estimate RR. However, then P1 and P0 are close to 0.5, the OR is typically much larger than RR.

Steve Simon wrote an excellent web page about the comparison of Odds Ratio versus Relative Risk.

In epidemiology class, we are typically advised to use OR for case-control study and use RR for cohort study. For cross-section studies, both OR and RR may be used.

In clinical trial setting, there is no consensus of using OR or RR. In practice, both of them are used. Sometimes, using OR or RR could be manipulated to serve one purpose or another. Brian Attig and Alison Clabaugh criticized the misuse of statistical interpretation of the odds ratio in APPLIED CLINICAL TRIALS.

Both OR and RR can be easily obtained from SAS Proc Freq with RISKDIFF option. An example by Mimi Chong from the following website illustrates this. We just need to be careful when we read the results from the SAS outputs. Odds ratio is clearly labelled and the Risk ratio is the the numbers corresponding to 'Col1 Risk' or 'col 2 risk' depending on which column is defined as 'event'.

If additional covariates (or in epimiology term, confounding factors) need to be considered, SAS Proc Freq with CMH option or Proc Logistic regression or Proc Genmod can be used.

Sunday, July 05, 2009

The interactive voice response system

Interactive Voice Response (IVR) is an interactive technology that allows a computer to detect voice and keypad inputs. Its use in clinical trial probably started with the randomization. It replaces the old way in handling the randomization (i.e., through concealed envelopes). It makes the central randomization feasible and make it easy for handling the complex randomization schedules such as the randomization with multiple layers of stratefications and the randomization with dynamic allocation.


Currently most sponsors perform randomization using an Interactive Voice Response System (IVRS) so that treatment codes for individual patients are no longer available at the sites for inspection.


The IVRS utilizes a dynamic randomization system using an adaptive minimization technique for pre-specified stratification variables. The randomization algorithm evaluates previous treatment assignments across the different strata to determine the probability of treatment assignment. There are no fixed randomization lists available prior to enrollment of patients and there are no pre-determined randomization schedules. The randomization algorithm is the source document and is supposed to be signed and dated prior to the time when the first patient is randomized into the study. An external vendor is used to manage the treatment allocation codes. Once a subject is found to be eligible for a trial, the investigator contacts the IVRS vendor and provides details about the subject including their stratification factors. Typically sites receive confirmation faxes from the IVRS vendor that include relevant patient information such as the date of randomization, the date of last visit and the date the last medication was assigned.


In emergences, investigators must call the IVRS vendor to break the treatment code since there are no longer envelopes with patient numbers and treatment codes for investigators to open at the sites. Treatment codes may be released to external vendors prior to the final analysis in order for plasma concentration analyses or PK modeling to be performed in patients receiving the new investigational treatment. Treatment codes may also be released or partially broken up to the code level (e.g. treatment A, B) for the Data Safety Monitoring Board (DSMB) and may be completely unblinded if the chairman of the DSMB requests it. Treatment codes are released to the sponsor or contract research organization after the official analysis database lock.



Nowadays, the IVRS moves to the web-based system. The term is also becoming IWRS (interactive web response system). It could be the situation, when we use the term IVRS, it actually means IWRS.



The utility of the IVRS/IWRS is not just limited to the randomization. It can be used in other areas as well:
  • Collecting the clinical efficacy outcome. For example, in IBS (Irritable Bowel Syndrome) study, the IVRS is used to collect the information about the presence or intensity of several IBS related symptoms daily (such as satisfactory relief, Abdominal discomfort or pain, Bloating, Stool frequency, Stool consistency...)
  • Patient Reported Outcome (ePRO)
  • Outcome research
  • Cohort management (open/close a cohort) in dose-escalation studies
  • Patient registry / registry studies
  • Drug supply managment / Study drug inventory tracking