The Unit of Analysis is the entity that frames what is being analyzed in a study or clinical trial. It is the entity being studied as a whole, within which most factors of causality and change exist. The unit of analysis is the “who” or the “what” that are being analyzed for a study or clinical trial. The Unit of Analysis is based on the experimental unit defined as "the smallest division of experimental material such that any two units may receive different treatments in the actual experiment" (Cox, 1992). Usually, the Unit of Analysis is on the same level as the unit for randomization.
We rarely talk about the Unit of Analysis, but actually, deal with it every time when we analyze the data. In clinical trials, we don’t explicitly talk about the Unit of Analysis because the Unit of Analysis is almost always the subject (or maybe called patient, health volunteer, or participant). Once the Unit of Analysis is established, all statistical analyses will be based on the Unit of Analysis – it means we count the number, perform the statistical model, include the explanatory variables all on the Unit of Analysis level. Given that the Unit of Analysis is ‘subject’ in clinical trials (in general), the subject level information or subject level variables will be used in analysis – that is why in CDISC ADaM data set, an ADSL (subject-level analysis dataset) will always be created.
The Unit of Analysis doesn’t have to be always the ‘subject’.
- For meta-analysis that is based on the summary information from multiple studies, the Unit of Analysis is ‘study’, not ‘subject’.
In a paper by Wong (2020) Estimation of clinical trial success rates and related parameters, the unit of analysis is 'study' or individual 'clinical trial'.
- In analyses of Covid-19 data, all models are based on county-level or hospital-level data. Due to the concern about privacy, the data on the individuals is not available. See https://covidseverity.com/ website for county-level Covid-19 related data. Here the unit of analysis is 'county' or 'hospital'.
- For studies using cluster randomization, the Unit of Analysis may be the cluster (township, city, household), not ‘subject’ even though the subject may be the observation unit.
In FDA’s guidance “Influenza: Developing Drugs for Treatment and/or Prophylaxis”, it specified that the Unit of Analysis could be the household.
"In household trials, the entire household is both the randomized unit and the unit of analysis. The primary efficacy analysis should compare the treatment groups for the percentage of households in which at least one randomized contact case developed symptomatic, laboratory-confirmed influenza. In other words, if one contact case in the household becomes symptomatically infected, the household is counted as infected. If none of the contact cases becomes infected, the household is considered not infected. Secondary analyses also can compare the percentage of contact cases that had symptomatic, laboratory-confirmed influenza in the active and placebo treatment groups.
Designs in which different contact cases in the same household receive different regimens raise concerns of drug sharing and intrahousehold correlation. Analysis using individual contact cases as the unit of analysis also may cause similar problems. Stratification on the size of household can be used, but is not expected to produce any consequential increase in power. "
- In some clinical trials, the Unit of Analysis may be smaller than the ‘subject’ level, for example, the tumor lesion in oncology studies, target bleeding site in studies for hemostasis agents.
In FDA Statistical Review for Lumason NDA, the Unit of Analysis using the lesion was performed
“The unit of analysis was the lesion; each subject had a single lesion that was to be characterized Sensitivity and Specificity are in percent (%) and n is the denominator for percentage calculation”
The Unit of Analysis may be different from the unit of observation. Within each unit of analysis, there may be multiple observations, for example, each subject with multiple events of hospitalization, exacerbation, adverse events. In this situation, we usually still analyze the data on the subject level and multiple events within a subject can be converted into the subject level data (time to first exacerbation, time to bleeding stoppage for the targeted bleeding site, best overall response based on the aggregated information from multiple lesions)
In FDA’s Statistical Review for Zerviate NDA, "The unit of analysis for all ocular variables was the average of both eyes of each subject."
In FDA’s review of Extended-Release and Long-Acting opioid analgesic (ER/LA) products, the unit of analysis is zip code (spatial) and quarter (time).
In both of the models proposed in the RADARS data analysis section, the unit of analysis is zip code (spatial) and quarter (time). Thus, testing for change between pre and post period for each outcome is investigating whether the average rate of events over time for the average zip code has changed from the pre-REMS period to the post-REMS period.
In clinicaltrials.gov, when clinical trial results are posted, the unit of analysis needs to be specified if the unit of analysis is not the subject.
"Type of Units Analyzed
Definition: If the analysis is based on a unit other than participants, a description of the unit of analysis (for example, eyes, lesions, implants). "
In a handbook from Cochrane.org, there was a section to discuss the Unit of Analysis:
9.3.1 Unit-of-analysis issues
An important principle in clinical trials is that the analysis must take into account the level at which randomization occurred. In most circumstances the number of observations in the analysis should match the number of ‘units’ that were randomized. In a simple parallel group design for a clinical trial, participants are individually randomized to one of two intervention groups, and a single measurement for each outcome from each participant is collected and analysed. However, there are numerous variations on this design. Authors should consider whether in each study:
groups of individuals were randomized together to the same intervention (i.e. cluster-randomized trials);
individuals undergo more than one intervention (e.g. in a cross-over trial, or simultaneous treatment of multiple sites on each individual); or
there are multiple observations for the same outcome (e.g. repeated measurements, recurring events, measurements on different body parts).
There follows a more detailed list of situations in which unit-of-analysis issues commonly arise, together with directions to relevant discussions elsewhere in the Handbook.
Sometimes, the Unit of Analysis can be misused. In a paper by A. Vail and E. Gardener “Common statistical errors in the design and analysis of subfertility trials”, it said that “Most trials (82%) included at least one ‘unit of analysis’ error”.
The most common error I can see is in the analysis of adverse events (AE). People can be confused with the different use of the Unit of Analysis. On the subject level, the adverse event should be analyzed to compare the incidence of AEs which is calculated as “the number of subjects with at least one specific AE divided by the number of subjects”. On the AE level, if we count the number of AEs, we can calculate the AE rate (number of AEs per subject; number of AEs per unit of exposure (person-year)) or AE density (number of AEs per drug infusion) – the meaning and interpretation are totally different than the incidence of AE.
In clinical trials with longitudinal design and crossover
design, while the analyses will include the multiple measures for each
individual subject, the unit of analysis is still the subject, but the more sophisticated
statistical models (mixed model repeat measures, random coefficient model, multi-level
or hierarchical linear models) will be needed.
Dr. Deng,
ReplyDeleteI very much enjoy your blog, and find it better than most web sites and textbooks by far.
thank you for continuing to write on biostatistics.
best,
R