On Biostatistics and Clinical Trials: 2015

Thursday, December 17, 2015

Median of Differences versus Difference in Medians

Both mean and median are used as location parameters to measure the central tendency of the data. If there is an intervention (such as drug treatment in a clinical trial), the mean and the median for the change from baseline can be used as the point estimate for measuring the magnitude of the effect by the intervention. The statistical test will then be performed to see if the point estimate of the effect is statistically significant or not.

One mistake people can make is to calculate the difference in medians while the correct way should be to calculate the median of differences. The example below is a typical data presentation for a pre-post study design. The change from baseline will be calculated for each subject. The mean and median will be calculated for change from baseline values across all subjects. One temptation is to calculate the difference in medians as the median for postbaseline - the median for baseline. However, the median of differences and the difference of medians can be very different especially when data is skewed.

Subject	Baseline	Post Baseline	Change From Baseline
1	50.6	38	-12.6
2	39.2	18.6	-20.6
3	35.2	23.2	-12
4	17	19	2
5	11.2	6.6	-4.6
6	14.2	16.4	2.2
7	24.2	14.4	-9.8
8	37.4	37.6	0.2
9	35.2	24.4	-10.8


Mean	29.36	22.02	-7.33
Median	35.2	19	-9.8

The median of differences is calculated as the 50th percentile of all individual differences (change from baseline). The Median of differences (the last column) is -9.8. However, the difference in medians = Median of Postbaseline Measures – Median of Baseline Measures = 19 – 35.2 = 16.2

The median of differences (-9.8) and the difference in medians (-16.2) are quite different especially for skewed data.

The median of differences is the correct number to be used and is the number that corresponding to the signed rank test.

It would be ok if we do this for mean. The mean of differences is equal to the difference in means, i.e., -7.33 = 22.02 (mean for postbaseline) – 29.36 (mean for baseline). However, if we need to perform a statistical test such as the paired t-test, the numbers in the last column for change from baseline should be the basis.

Suppose we have "change from baseline" for two treatment groups, we would need to calculate the median for each treatment group in the same way as above. For treatment comparison, we may use the non-parametric Wilcoxon rank-sum test and calculate the magnitude of the difference in medians using the Hodges-Lehmann estimator. Hodges Lehmann's estimation of location shift can be calculated in SAS using Proc NPAR1WAY.

Thursday, December 03, 2015

Dose Response Modeling: calculating EC50, ED50 by Fitting Emax Model using SAS Proc NLIN

Dose response data can come from the laboratory test, pre-clinical, and clinical. The responses can be the assay results, fluorescence output, cell counts, hormone concentrations, efficacy measures.

This type of dose response data can be analyzed using model-based approach - assuming a functional relationship between the response and the dose following a pre-specified parametric model. There are many different models used to characterize a dose-response: linear, quadratic, orthogonal polynomials, exponential, linear in log-dose, Emax. If the response is discreet or dichotomous (success/failure, survival/death,...), it is called quantal response. The different set of models will need to be used such as probit, logit, ordinal logistic, and extreme value (or gompit) regression models that can be easily fit using SAS Proc Probit.

For continuous response data, one of the most common parametric model is Emax model. There is a 3-parameter Emax model by fitting the dose response function g(D)

where E0 is the response Y at baseline (absence of dose), Emax is the asymptotic maximum dose effect (maximum effect attributable to the drug) and ED50 is the dose which produces 50% of the maximal effect. A generalization is the 4-parameter Emax model for

where l

is the 4th parameter which is sometimes called the Hill parameter or Hill factor, or slope factor. The Hill parameter affects the shape of the curve and is in some cases very difficult to estimate.

The Emax model may be referred as three-parameter logistic model and four-parameter logistic model, or simply three-parameter model and four-parameter model.

SAS Proc NLIN can be used to fit the Emax model (three-parameter or four-parameter). We can use the data in an online paper "How can I generate a dose response curve in SAS?". The concentration-response is similar to the dose-response from the modeling purpose. In the data below, for some concentration level, there are two duplicates.

Concentration	Response
0.1	21.125
0.1	20.575
0.25	40.525
0.5	26.15
0.75	26.35
0.75	44.275
1	49.725
1	63.6
10	49.35
10	68.875
100	58.025
100	58.075
1000	68.025
1000	52.3

We can read the data into SAS data set as following:

data dr;
input concentration response;
datalines;
.1 21.125
.1 20.575
.25 40.525
.5 26.15
.75 26.35
.75 44.275
1 49.725
1 63.6
10 49.35
10 68.875
100 58.025
100 58.075
1000 68.025
1000 52.3
;

In order to fit the four parameter Emax model above, we will need to provide the initial values for all four parameters. The initial values provided do not need to be precise. Usually, the same results can be obtained with different initial values. However, we want to provide the initial values close to the data. For example, from the data set above, we can choose the min as the E0 (minimum response), max as Emax (maximum response), a median dose as ED50 (dose corresponding to 50% of Emax). For the fourth parameter (Hill slope), we can run a simple linear regression to obtain an initial slope. It is not critical if the concentration response really follows the linear relationship. The purpose here is just to obtain an initial Hill slope value for the non linear model.

If we run the following simple regression, we will get a slope of 0.01831 and we can use this value as the initial value for the fourth parameter.

proc reg data=dr;
model response=concentration;
run;

From the data set, we now have the initial values for all four parameters: E0 = 20.575, Emax = 68.875, ED50 = 1, and slope factor = 0.01831. Again, these initial values do not have to be very accurate.

We can then run the following SAS program to fit the non linear (four parameter Emax) model described above:

proc nlin data = dr method=marquardt;
parms E0 = 20.575 Emax = 68.875 ED50 = 1 hill = 0.01831;
model response = Emax + (E0 * concentration**hill) / (ED50**hill + concentration**hill);
run;

From the outputs, we will get an estimate of ED50 = 0.8171.

Notice that in online paper "How can I generate a dose response curve in SAS?", a different four parameter model was presented. However, if we fit the nonlinear model, we will get the same estimate of ED50. The four-parameter model was written as:

The SAS program can be written as:

proc nlin data = dr method=marquardt;
parms E0 = 20.575 Emax = 68.875 ED50 = 1 hill = 0.01831;
model response = E0 + (Emax - E0) / (1 + (concentration / ED50)**hill);
run;

Reference/Further reading:

Felmlee, Morris, and Mager (2012) Mechanism-Based Pharmacodynamic Modeling
James Macdougall Analysis of Dose–Response Studies—Emax Model
Dayneka, Garg, and Jusko (1993) Comparison of Four Basic Models of Indirect PharmacodynamicResponses Boomer: Pharmacodynamic Models
Sigmoidal Emax Model

Saturday, November 21, 2015

Pediatric Study Plan (PSP) and Paediatric Investigation Plan (PIP)

Pharmaceutical companies usually put their efforts into the adult population when they develop a new compound. There is usually low rates of pediatric testing that resulted in a paucity of information regarding the safe use of pharmaceutical products in children. While a common refrain heard from regulators is that "children are not simply little adults," physicians had little to inform with which to inform their prescribing habits. To encourage drug development in the pediatric population, the regulatory agencies have come up with different requirements and incentives. Pediatric Study Plan (PSP) and Paediatric Investigation Plan (PIP) are requirements in the US and EU respectively.

PSP in the US is the result of PREA (Pediatric Research Equity Act) and FDASIA (The Food and Drug Administration Safety and Innovation Act). Under FDASIA, signed into law on July 9, 2012, for the first time PREA includes a provision that requires manufacturers of drugs subject to PREA to submit a PSP early in the drug development process. The intent of the PSP is to identify needed pediatric studies early in drug development and begin planning for these studies. The timing and content of the submission of an initial PSP are described below. FDASIA requires the FDA to promulgate regulations and issue guidance to implement these and other provisions.

PIP (paediatric investigation plan) in EU is a development plan aimed at ensuring that the necessary data are obtained through studies in children, to support the authorisation of a medicine for children. All applications for marketing authorisation for new medicines have to include the results of studies as described in an agreed PIP, unless the medicine is exempt because of a deferral or waiver. This requirement also applies when a marketing-authorisation holder wants to add a new indication, pharmaceutical form or route of administration for a medicine that is already authorised and covered by intellectual property rights.

Questions: Are PSP and PIP mandated or voluntary?

PSP and PIP are mandated unless it is waivered or deferred. Waiver means that the pediatric study/investigation plan is not needed. Deferral means that the pediatric study/investigation plan can be deferred to the post-marketing stage.

Question: How to obtain the waiver or deferral for PSP and PIP?

Under some circumstances, pediatric assessment may be unnecessary, undesirable, impractical, or delayed. In the US, the legislation authorizes FDA to grant waivers or deferrals to the pediatric assessments required under the Act. If the applicant requests a waiver or deferral, either full or partial, appropriate and sufficient supporting evidence must be provided. The criteria for granting waivers or deferrals center on safety, the nature of the drug product, and the practicability of the requisite studies. FDA has provided some specific information in a draft guidance, that describes how to comply with PREA.

In the EU, the Class waivers will be granted by the regulatory authority. "The need for a paediatric development may be waived for classes of medicines that are likely unsafe or ineffective in children, that lack benefit for paediatric patients or are for diseases and conditions that only affect the adult population. This page lists the class waivers granted by the European Medicines Agency (EMA)."

Question: what is the age cut point for defining the pediatric population

In the EU, pediatric population refers to children aged less than 18 years old.

In the US, pediatric population refers to children ages less than and equal to 16 years old. FDA further classify the pediatric population into the following categories:

NAME	DEFINITION	FDA CODE
NEONATES	NEWBORNS UP TO ONE MONTH	NEO
INFANTS	ONE MONTH TO TWO YEARS	INF
CHILDREN	TWO YEARS TO TWELVE YEARS	CHI
ADOLESCENTS	TWELVE YEARS TO SIXTEEN YEARS	ADO

An example of a partial waiver would be that PSP/PIP is not required for children less than or equal to two years old, but required for children greater than 2 years old.

Question: What are the incentives for doing pediatric studies?

In the US, As an incentive to industry to conduct studies requested by the Agency, Section 505(A) provides for a 6-month period of marketing exclusivity (pediatric exclusivity). in addition, FDA also has a Rare Pediatric Disease Priority Review Voucher Program that can issue a priority review voucher for companies who has drug development in rare pediatric disease. A priority review voucher can be worth millions of dollars.

In the EU, Rewards and incentives for paediatric medicines include the following:

· Medicines authorised across the EU with the results of studies from a paediatric investigation plan included in the product information is eligible for an extension of their supplementary protection certificate by six months. This is the case even when the studies' results are negative.

· For orphan medicines, the incentive is an additional two years of market exclusivity.

· Scientific advice and protocol assistance at the Agency are free of charge for questions relating to the development of paediatric medicines.

· Medicines developed specifically for children that are already authorised but are not protected by a patent or supplementary protection certificate are eligible for a paediatric-use marketing authorisation (PUMA). If a PUMA is granted, the product will benefit from 10 years of market protection as an incentive.

Question: Who will assess the PSP and PIP?

In the US, the PSP will be reviewed by the Pediatric Review Committee (PeRC).

In the EU, The Paediatric Committee (PDCO) is the committee at the European Medicines Agency that is responsible for assessing the content of paediatric investigation plan and adopting opinions on them. This includes assessing applications for full or partial waivers and assessing applications for deferrals.

References:

· US FDA (July 2013) Pediatric Study Plans: Content of and Process for Submitting Initial Pediatric Study Plans and Amended Pediatric Study Plans

· US FDA (July 2005) How to Comply with the Pediatric Research Equity Act

· US FDA Pediatric Product Development Webpage

· EMA Paediatric investigation plans

· EMA Paediatric investigation plans: Templates, forms and submission dates

· Current FDA Perspective & Future of JAS Testing to SupportPediatric Development Programs

Pediatric Medicine Development: An Overview and Comparison of Regulatory Processes in the European Union and United States
Emilie Desfontaine (2012) PIP assessment procedure
Thomson (2019) Global Pediatric Development Plan
Wable (2018) Navigating the development of pediatric plans

Monday, November 09, 2015

Human Genetic Enginering, DNA Modifications, DNA Editing, Genome Editing, Trans-humanism, Transgenic, xenotransplantation, Humanized Animals, and Human Chimeras

Last week, NPR had a report "Should Human Stem Cells Be Used To Make Partly Human Chimeras?". Apparently, there was a meeting in NIH to discuss the ethic issues of the DNA engineering and the hybrid of human and animal genes. The podcast can be listened below.

DNA engineering has advanced very much in recent years. The hybrid of human and animal genes is becoming possible. While the benefit is obvious in pharmaceutical and other area, the ethic issue can not be ignored.

An article "Ethical Implications of Human Genetic Engineering" by Renuka Sivapatham touched all of these ethic issues.

A youtube video titled "Hman Genetic Engineering Legal??? Hybrid World: DNA Modification, Trans-humanism, Transgenic..." talked about this fascinating, but scary topic.

On FDA's side, there are a set of in 'Cellular & Gene Therapy' and in 'Xenotransplantation', for example:

FDA also has a Cellular, Tissue, and Gene Therapies Advisory Committee that reviews and evaluates available data relating to the safety, effectiveness, and appropriate use of human cells, human tissues, gene transfer therapies and xenotransplantation products which are intended for transplantation, implantation, infusion and transfer in the prevention and treatment of a broad spectrum of human diseases and in the reconstruction, repair or replacement of tissues for various conditions.

FDA defines the xenotransplantation as "any procedure that involves the transplantation, implantation or infusion into a human recipient of either (a) live cells, tissues, or organs from a nonhuman animal source, or (b) human body fluids, cells, tissues or organs that have had ex vivo contact with live nonhuman animal cells, tissues or organs. The development of xenotransplantation is, in part, driven by the fact that the demand for human organs for clinical transplantation far exceeds the supply."

In EU, EMA this year issued its guideline "Guideline on the quality, non-clinical and clinical aspects5 of gene therapy medicinal products".

Further Reading:

Minimizing the missing data: data collection after subjects withdraw from the clinical trial

Recently, I have heard quite some discussions about the data collection in clinical trials for subjects who discontinue early from a clinical trial. The data collection for key efficacy and safety outcomes will continue even after the subjects have discontinued the study treatment, or discontinued from the study. This has been considered as one of the approaches for minimizing the missing data.

In 2000, National Academies published the report “The Prevention and Treatment of Missing Data in Clinical Trials”. The report considered ‘continuing data collection for dropouts’ as one of the key approaches in minimizing the missing data. It had a lengthy narratives about this approach in the report.

CONTINUING DATA COLLECTION FOR DROPOUTS

Even with careful attention to limiting missing data in the trial design, it is quite likely that some participants will not follow the protocol until the outcome data are collected. An important question is then what data to collect for participants who stop the assigned treatment. Sponsors and investigators may believe that the participants are no longer relevant to the study and so be reluctant to incur the costs of continued data collection. Yet continued data collection may inform statistical methods based on assumptions concerning the outcomes that participants might have had if they continued treatment. Continued data collection also allows exploration of whether the assigned therapy affects the efficacy of subsequent therapies (e.g., by improving the degree of tolerance to the treatment through exposure to a similar treatment, i.e., cross-resistance).

The correct decision on continued data collection depends on the selected estimand and study design. For example, if the primary estimand does not require the collection of the outcome after participants discontinue assigned treatment, such as with the estimand (4) above (area under the outcome curve during tolerated treatment), then the benefits of collecting additional outcome data after the primary outcome is reached needs to be weighed against the costs and potential drawbacks of the collection.

An additional advantage of data collection after subjects have switched to other treatments (or otherwise violated the protocol) is the ability to monitor side effects that occur after discontinuation of treatment. Although the cause of such side effects may be unclear (e.g., if a subject switches to another treatment), these data, when combined with long-term followup of other subjects in high-quality epidemiological studies, may help to determine treatment-associated risks that are not immediately apparent. We are convinced that in the large majority of settings, as has been argued by Lavori (1992) and Rubin (1992), the benefits of collecting outcomes after subjects have discontinued treatment outweigh the costs.

Recommendation 3: Trial sponsors should continue to collect information on key outcomes on participants who discontinue their protocol-specified intervention in the course of the study, except in those cases for which a compelling cost-benefit analysis argues otherwise, and this information should be recorded and used in the analysis.

Recommendation 4: The trial design team should consider whether participants who discontinue the protocol intervention should have access to and be encouraged to use specific alternative treatments. Such treatments should be specified in the study protocol.

Recommendation 5: Data collection and information about all relevant treatments and key covariates should be recorded for all initial study participants, whether or not participants received the intervention specified in the protocol.

In EMA’s “Guideline on Missing Data in Confirmatory Clinical Trials”, data collection for dropouts is also recommended.

It should be the aim of those conducting clinical trials to achieve complete capture of all data from all patients, including those who discontinue from treatment.

When patients drop out of a trial, full reporting of all reasons for their discontinuation should be given where possible. This should allow identification of the most important reasons that caused them to discontinue and may influence how these subjects are treated in the missing data analysis. Any follow-up information collected post dropout could be helpful in justifying how these patients are handled in the analyses.

From the implementation standpoint, FDA’s policy regarding this is reflected in its “Guidance for Sponsors, Clinical Investigators, and IRBs Data Retention When Subjects Withdraw from FDA-Regulated Clinical Trials”. The guidance discussed the data collection prior to the withdrawal and after the withdrawal when subjects withdraw from clinical trials.

According to FDA regulations, when a subject withdraws from a study, the data collected on the subject to the point of withdrawal remains part of the study database and may not be removed.

An investigator may ask a subject who is withdrawing whether the subject wishes to provide continued follow-up and further data collection subsequent to their withdrawal from the interventional portion of the study. Under this circumstance, the discussion with the subject would distinguish between study-related interventions and continued follow-up of associated clinical outcome information, such as medical course or laboratory results obtained through non-invasive chart review, and address the maintenance of privacy and confidentiality of the subject’s information.

If a subject withdraws from the interventional portion of the study, but agrees to continued follow-up of associated clinical outcome information as described in the previous bullet, the investigator must obtain the subject’s informed consent for this limited participation in the study (assuming such a situation was not described in the original informed consent form). In accordance with FDA regulations, IRB approval of informed consent documents would be required.

If a subject withdraws from the interventional portion of a study and does not consent to continued follow-up of associated clinical outcome information, the investigator must not access for purposes related to the study the subject’s medical record or other confidential records requiring the subject’s consent. However, an investigator may review study data related to the subject collected prior to the subject’s withdrawal from the study, and may consult public records, such as those establishing survival status.

In a paper by O’Neill and Temple “The Prevention and Treatment of Missing Data in Clinical Trials: An FDA Perspective on the Importance of Dealing With It”, they stated that the data collection after subject withdrawal has its benefits in outcome studies (i.e., the studies with morbidity or mortality events), may not be appropriate for trials with symptomatic measures (usually the quantitative measurement).

A classic remedy, at least in outcome studies, is to attempt to measure outcomes in all the subjects who were initially randomized, including those who withdraw from therapy; this is the “intent to treat” (ITT) approach to the analysis of clinical trial data. As an example of why this might be important, consider an outcome study (with an end point of survival) in which the test drug exacerbated heart failure. In these circumstances, subjects with heart failure, who might be at an increased risk for death, would be more likely to leave the test-drug group. This would lower the mortality risk in the test-drug group and give that drug an advantage with respect to its safety profile, unless the dropouts were followed and the post-dropout events counted. The ITT approach is intended to protect against this kind of “informative censoring” by requiring that dropouts be followed up and that post-dropout events be counted. It is recognized that an ITT analysis is conservative (after all, the benefits of a drug usually disappear once it is stopped), but this is generally considered acceptable in outcome studies. There are compromise approaches—e.g., counting events that occur within 30 days of stopping treatment, assuming that subjects are followed for that duration and it is possible to ascertain outcomes.

Trials of symptomatic benefit generally measure the effects of assigned treatment at successive visits over the duration of the trial, but they typically use the value measured at the final visit as the primary end point. In such trials, the missing-data problem is of a different kind. Early dropouts can leave treatment groups unbalanced with respect to important prognostic patient characteristics related to time-dependent response to treatments. The effect of these dropouts on outcome could go in either direction, i.e., exaggerating or minimizing drug–placebo differences, depending on the reasons for dropping out and whether there were spontaneous (i.e., not drug-related) changes in the outcomes in the study population.

In symptomatic settings, it is not the usual practice to continue to assess effectiveness in patients after they have stopped taking the assigned treatment (ITT approach), as the drug’s effect is assumed to be lost; also, in many cases, an alternative drug is started, and this could influence the outcome for a subject. It is also possible that if a serious adverse end point occurs after the subject has withdrawn from the assigned treatment that event is not captured in the study data. There is also generally less concern, in the symptomatic setting about not capturing a serious study endpoint that was about to occur.

Another issue is the statistical analysis for the data collected after subjects discontinued the study drug or withdraw from the study. Dr Lisa LaVange had a presentation about “Missing data issues in regulatory clinical trials”. She rightly pointed out the issue post NRC report about the missing data:

Issues encountered in regulatory reviews since publication of the NRC report:- Some increase in plans to collect data following early discontinuations (of treatment and / or study)
- But often without plans for how to use the post-discontinuation data collected

Even though the data is collected after the subject withdraw from the study, the data may be used only in sensitivity analysis in an exploratory way, not used in the primary analysis of the study. For example, in a paper “Macitentan and Morbidity and Mortality in Pulmonary Arterial Hypertension”, the vital status for subjects who discontinued from the study was collected, however the mortality analysis with the data post subject discontinuation was only used as an exploratory analysis. It is no surprise that there was no difference between two treatment groups since the placebo patient received the active drug after they discontinued from the study and the treatment effect (if any) will be neutralized. In response to the letter to editor, the author stated the potential biases in such analysis.

With respect to death up to the end of the study, patients who were initially randomly assigned to placebo may have received open-label macitentan after a nonfatal event; this introduced a bias in favor of placebo.

In a book by Domanski and Mckinlay “Successful Randomized Trials: A Handbook for 21 Century”, the utilization of the data collected after subject's withdrawal was discussed.

Caveat Collection of data, at least on morbidity and mortality, following discontinuation of treatment, does not preclude doing analyses in which these data are not used after treatment is stopped, for example, a nonintent to treat analysis. It is important that at least one analysis be by intent to treat, and if the data are not collected, the analysis is not possible.

In summary, to minimize the missing data in clinical trials, it will be a good practice and a safe approach to continue the data collection after subjects discontinue from the study, especially for outcome studies. The data collected after subjects' withdrawal can be used for performing sensitivity analysis, usually not as the primary analysis approach.

References:

Strategiesfor dealing with Missing data in clinical trials: From design to Analysis

Tuesday, November 03, 2015

FDA Issued Draft Guidance for Industry for HIV drug "Human Immunodeficiency Virus-1 Infection: Developing Antiretroviral Drugs for Treatment"

In an effort to help the industry to understand the requirements and expectations for drug development, FDA started to issue the guidance on individual therapeutic area. Here are some examples of the therapeutic area specific guidances:

The latest therapeutic area specific guidance is on HIV drug development "Human Immunodeficiency Virus-1 Infection: Developing Antiretroviral Drugs for Treatment". The draft guidance provides the detail guidance from pharmacology, pre-clinical, to the clinical trials. It covers all aspects in clinical trial design and statistical analyses for HIV drug development. It also touched on the issues that statisticians are discussing:

Regarding the missing data and the data collection after patient discontinuation from the study treatment:

"There is no single optimal way to deal with missing data from clinical trials. Sponsors should make every attempt to limit loss of patients from the trial, and when the loss is unavoidable, collect information that can help explain the cause of the loss and the final status of the patient. Analyses excluding patients with missing data or other post-treatment outcomes are potentially biased because patients who do not complete the trial may differ substantially in both measured and unmeasured ways from patients who remain in the trial. The method of how missing data will be handled should be specified in the protocol or the SAP. A patient retention and follow-up plan should be included in the protocol providing details on how to minimize missing data and collect follow-up information."

Regarding the timing of Statistical Analysis Plan completion and contents of SAP:

"Before unblinding any phase 2b or phase 3 trial, sponsors should have in place a detailed finalized SAP. Although sponsors can update or modify an SAP as long as the trial remains blinded, sponsors should recognize that a detailed discussion may be needed concerning data access and appropriate firewalls for maintaining the integrity of the blind. If any major modification occurs, sponsors should discuss the modifications with the DAVP. Ideally, the SAP should be prepared at the time the protocol is made final, but we recognize that changes are sometimes made later, but before unblinding. The SAP should be considered as part of the protocol, and it can be either a section within the protocol (encouraged) or a separate document. The SAP should include:

Details on endpoint ordering

Analysis populations

Structure of statistical hypotheses to be tested

Statistical methods including the mathematical formulations

Level of significance or alpha-level

Alpha adjustments for multiple comparisons or interim analyses if applied

Definition of visit window

Handling of missing data

Sensitivity analyses

It is important that the SAP prospectively identify any covariates that are to be used in the analysis. It is also important to choose covariates that are expected to strongly influence outcome."

Monday, October 12, 2015

Sample Size Estimation Based on Precision for Survey and Clinical Studies such as Immunogenicity Studies

Sometimes, we may need to calculate the sample size to estimate a population proportion or a population mean with a precision or margin of error. Here we use the terms ‘precision’ and ‘margin of error’ interchangeably. The precision may also be referred as “half of the confidence interval”, “half of the width of CI”, and “Distance from mean to limit” depending on the sample size calculation software.

Statistician may need to estimate the sample sizes for the following situations:

Example 1: A survey estimated that 20% of all Americans aged 16 to 20 drove under the influence of drugs or alcohol. A similar survey is planned for New Zealand. The researchers want to estimate a sample size for the survey and they want a 95% confidence interval to have a margin of error of 0.04.

Example 2: an immunogenicity study is planned to investigate the occurrence of the antibody to a therapeutic protein. There is no prior information about the percentage patients who may develop the antibody to the therapeutic protein. How many patients are needed for the study with a 95% confidence interval and a precision of 10%?

Example 3: A tax assessor wants to assess the mean property tax bill for all homeowners in Madison, Wisconsin. A survey ten years ago got a sample mean and standard deviation of $1400 and $1000. How many tax records should be sampled for a 95% confidence interval to have a margin of error of $100?

These are set of situations where the sample size estimation is based on the confidence interval and the margin of error. The examples #1 and #2 are dealing with the one-sample proportion where we would like to estimate the sample size in order to obtain an estimate for population proportion with certain precision. The example #3 is dealing with one-sample mean where we would like to estimate the sample size in order to obtain an estimate for population mean with certain precision.

Sample Size to Estimate A Proportion With a Precision

The usually formula is:

N = z^2 p(1-q) / d^2

where p is the proportion (may be obtained from the previous study or and d is the precision or margin of error. Z is the Z-score e.g. 1.645 for a 90% confidence interval, 1.96 for a 95% confidence interval, 2.58 for a 99% confidence interval

For example #1, the sample size will be calculated as:
N = 1.96^2 x 0.2 x 0.8/0.04^2 = 384.2 round up to 385

Similarly, if we use PASS, the input parameters will be

Confidence Interval: Simple Asynptotic

Interval Type: Two-sided

Confidence level (1-alpha): 0.95

Confidence Interval Width (two-sided): 0.08 (note: 0.04 x 2)

P (Proportion): 0.2

For example #2, since there is no prior information about the proportion, the practical way is that if no estimate of p is available, assume p = 0.50 to obtain a sample that is big enough to ensure precision.

If we use formula, the sample size will be calculated as:

N = 1.96^2 x 0.5 x 0.5 / 0.1^2 = 96

Similarly, if we use PASS, the input parameters will be

Confidence Interval: Simple Asymptotic

Interval Type: Two-sided

Confidence level (1-alpha): 0.95

Confidence Interval Width (two-sided): 0.2 (note: 0.1 x 2)

P (Proportion): 0.5

Sample Size to Estimate A Proportion With a Precision

The usually formula for is:

N = (s t/d)^2

Where s is the standard deviation, t is the t-score (approximate to Z-score if assuming normal) and d is the precision or margin of error.

For example #3:

N=(1000 x 1.96/100)^2 = 385

Similarly, if we use PASS, the input parameters will be:
Solved for: Sample size
Interval type: two-sided
Population size: infinite
Confidence Interval (1-alpha): 0.95
Distance from mean to limits: 100
S (standard deviation): 1000

The sample size calculation based on the precision is population in survey in epidemiology studies and polling in political science. In clinical trials, it seems to be common in immunogenicity studies. In immunogenicity studies, it is not just for one sample situation, it may also be used in the two sample situation. In a book “Biosimilars: Design and Analysis of Follow-on Biologics” by Dr Chow, sample size section mentioned the calculation based on precision:

In immunogenicity studies, the incidence rate of immune response is expected to be low. In this case, the usual pre-study power analysis for sample size calculation for detecting a clinically meaningful difference may not be feasible. Alternatively, we may consider selecting an appropriate sample size based on precision analysis rather than power analysis to provide some statistical inference.

The half of the width of the CI by w=Z(1-alpha)/2*sigma hat which is usually referred to as the maximum error margin allowed for a given sample size n. In practice, the maximum error margin allowed represents the precision that one would expect for the selected sample size. The precision analysis for sample size determination is to consider the maximum error margin allowed. In other words, we are confident that the true difference signma=pR-pr would fall within the margin of w=Z(1-alpha)/2*sigma for a given sample size of n. Thus, the sample size required for achieving the desired precision can be chosen.

This approach, based on the interest in only the type I error, is to specify precision while estimating the true delta for selecting n.
Under a fixed power and significance level, the sample size based on power analysis is much larger than the sample size based on precision analysis with extremely low infection rate difference or large allowed error margin.

SAS Proc Power can also calculate the sample size. The exact method is used for sample size calculation in SAS. The obtained sample size is usually greater that the ones calculated by hand (formula) or using PASS.

For confidence interval for one-sample proportion situation, the SAS codes will be something like this:

proc power;

onesamplefreq ci=wilson

halfwidth = 0.1

proportion = 0.3

ntotal = 70

probwidth = .;

run;

For confidence interval for one-sample mean, refer to an example provided in SAS online document: SAS 9.22 User’s Guide Example 68.7 Confidence Interval PrecisionExample