In clinical trials where outcome measures violate the assumptions required for parametric statistical methods such as analysis of covariance (ANCOVA), non-parametric approaches are often employed. The classical two-sample Wilcoxon rank-sum test (also known as the Mann–Whitney U test) can be used in such cases; however, this method does not allow for adjustment of covariates.
Randomized clinical trials often adjust for baseline covariates (FDA 2021 guidance "Adjusting for Covariates in Randomized Clinical Trials for Drugs and Biological Products") to improve precision. When the outcome is non‐normal or ordinal, rank‐based methods offer alternatives to parametric ANCOVA. Two popular approaches are: (1) Rank ANCOVA – an ANCOVA on rank‐transformed data (the “rank‐transformation ANCOVA” of Conover and Iman), and (2) Aligned‐Rank Stratified Wilcoxon – a Wilcoxon rank‐sum test applied to responses “aligned” by covariate effects (e.g. Hodges‐Lehmann or van Elteren style). Below we compare their theory, practical use, pros/cons, and SAS implementations.
Theoretical Properties
- Rank ANCOVA (Rank‐Transform ANCOVA):
Replace the outcome Y by its overall ranks and fit a usual linear ANCOVA model (treatment + covariates). Under the null it tests whether the treatment coefficient in the rank‐scale model is zero. In effect, the null is “no difference in the adjusted rank‐distributions” between groups. This approach assumes a linear additive model for covariate effects (on the rank‐scale) with homogeneous slopes across treatments. Conover and Iman showed the rank‐transform inherits the robustness and power properties of rank tests in regression. However it is not fully model‐free: violations of slope homogeneity or covariate–treatment interactions can invalidate the test. The test statistic is simply the usual F (or t) from ANCOVA on ranks. Rank ANCOVA was proposed by Dana Quade (1967) "Rank Analysis of Covariance" in JASA and was popularized after the book by Stokes, Davis, and Koch "Categorical Data Analysis Using SAS" where there was a dedicated chapter to discuss the RANK ANCOVA.
- Aligned‐Rank Stratified Wilcoxon:
Agligns outcomes within each randomization stratum by subtracting the stratum’s Hodges–Lehmann shift (a median-based location estimate) and then applies a Wilcoxon rank-sum test to the pooled aligned data. Effectively performs an un-stratified Wilcoxon test on aligned values. Yields a Hodges–Lehmann estimate of the overall median shift (with confidence interval) as the treatment effect.
This method is fundamentally a rank‐sum test controlling for strata or covariates. In a stratified design (or ANCOVA context), one assumes a model Yij=μ+βi+τjY_{ij}=\mu +\beta_i+\tau_j where βi\beta_i are strata/covariate effects and τj\tau_j are treatment effects. The null hypothesis is τ1=τ2=0\tau_1=\tau_2=0 (no treatment shift). To remove (align) the strata effects, one subtracts a location statistic (stratum mean/median/Hodges–Lehmann) from each YijY_{ij}, yielding “aligned” responses. These aligned values (now centered by stratum) are pooled and ranked, and a Wilcoxon test is performed ignoring strata. Equivalently, with a STRATA factor one can perform a Van Elteren (stratified Wilcoxon) test. The resulting null distribution is distribution‐free (asymptotically normal, exact via permutations), requiring only that within each stratum the treated and control distributions differ by a location shift. In summary, Rank ANCOVA assumes a linear rank‐model for covariate adjustment, whereas the Aligned‐Rank Wilcoxon assumes only an additive strata effect (with no specific distribution form) and tests a location‐shift null in a stratified rank framework.
Practical Applications
- Outcome types: Both methods suit
continuous or ordinal outcomes that violate parametric assumptions. Rank
ANCOVA can handle multiple continuous covariates straightforwardly (since
one simply adds them to the ANCOVA on ranks). Aligned‐rank tests naturally
handle categorical strata (e.g. randomization factors, center, or
baseline strata). If the covariate is continuous (e.g. baseline measure),
one may either form strata (e.g. quantiles) or perform alignment via
regression residuals before ranking.
- Robustness: Both are robust to
outliers and non‐normality because they use ranks. Conover and Iman
noted that rank‐transformed regression is robust and powerful even
under heavy‐tailed or skewed data. The aligned‐rank approach is fully
distribution‐free under its null hypothesis, and by aligning removes
nuisance location effects (e.g. site or block means). In practice, Ye and
Lai (2023) found that both a covariate‐adjusted rank‐sum (rank ANCOVA) and an
aligned‐rank
test yielded narrower confidence intervals and maintained type I error
across clinical trials, compared to unadjusted tests.
- Small samples: Neither method
“magically” solves small‐N issues. Rank ANCOVA relies on large‐sample ANCOVA (normal‐theory) approximations
on ranks. In contrast, the aligned‐rank/Wilcoxon method can use exact computations (via
PROC NPAR1WAY) if samples are very small. However, note that alignment can
behave erratically in very small samples according to some studies. In
moderate samples both methods are generally acceptable. Both methods lose
some efficiency if there are many ties or if covariate effects are very
non‐linear.
Advantages and Disadvantages
- Rank ANCOVA
– Advantages: It is conceptually simple (just rank the data and run standard ANCOVA). It fully utilizes continuous covariates in a regression framework, and retains much of the power of ANOVA tests while being robust to non‐normal errors. If the linear model is correct (in the rank‐scale), the test is valid and can easily test interactions. Simulations suggest rank‐based ANCOVA is often powerful when parametric assumptions fail. Adjusting for covariates in this way typically yields more precise estimates (narrower CIs) than unadjusted ranks.
– Disadvantages: Its interpretation is subtle: one is testing effects on the ranks of the outcome, not its mean or median in original units. Thus the “treatment effect” corresponds to a location shift in the ranked distribution. Experts caution that rank‐transform ANCOVA is not a direct test of medians in the original scale, and may give misleading inference if reporting medians. It also implicitly ranks the covariates (if one ranks them too) which can distort relationships. If the homogeneity of slopes assumption is violated, the rank‐ANCOVA test can be invalid. In summary, it is not fully nonparametric – it still relies on the linear model structure (albeit on ranks) and on large‐sample approximations.
- Aligned Rank Wilcoxon
– Advantages: This method is fully nonparametric under a location‐shift null, and can produce an easily interpreted Hodges–Lehmann median shift estimate with confidence limits (e.g. via ALIGN=STRATA(HL) in SAS). It inherently accounts for stratification/baseline effects by alignment, so it naturally handles block effects or randomization strata (van Elteren’s approach). It does not require ranking the covariate itself – only the outcome after centering. In practice it has robust Type I error even under heteroscedasticity or skewness, so long as the alignment model is reasonable. SAS’s PROC NPAR1WAY provides this test directly (see below).
– Disadvantages: One must specify the strata or alignment model in advance. If there is only one continuous covariate, one needs to decide how to align (e.g. subtract predicted baseline effect or stratum median). The choice of alignment statistic (median vs mean vs HL) can affect results slightly. If strata are very small, the within‐stratum rankings may be unstable. Unlike rank‐ANCOVA, this approach cannot easily incorporate arbitrary continuous covariates without discretization or pre‐alignment. In very small samples, alignment procedures can suffer from erratic Type I behavior (especially if alignment assumptions are misspecified). Finally, it is less familiar to many practitioners and thus may be harder to explain.
SAS Implementation
- Rank ANCOVA: SAS does not have a
dedicated “rank‐ANCOVA” proc. A common workaround is to rank the
data (using PROC RANK) and then run a standard ANCOVA (PROC GLM or
PROC REG) on the ranked outcome. For example:
proc rank data=trial out=ranked
ties=mean;
var Y; ranks Y_rank;
run;
proc glm data=ranked;
class Trt;
model Y_rank = Trt
Baseline;
run;
This fits a linear model to the ranks. Alternatively, one can implement the Hettmansperger–McKean aligned‐rank procedure by first regressing Y on the covariate (e.g. with PROC REG or PROC ROBUSTREG) and then ranking the residuals to test the treatment effect. In short, one must manually rank or residualize and then use standard SAS procs. (There is no built‐in PROC RANKANCOVA or similar in SAS.)
The book "Categorical Data Analysis Using SAS®, Third Edition" by Stokes, Davis, and Koch contains the sample SAS codes indicating three steps in performing the Rank ANCOVA.
proc rank nplus1 ties=mean out=ranks;by center;var before after;run;proc reg noprint;by center;model after=before;output out=residual r=resid;run;proc freq;tables center*group*resid / noprint cmh2;run;
- Aligned Rank Stratified
Wilcoxon:
SAS’s PROC NPAR1WAY supports both stratified Wilcoxon and aligned‐rank tests. For a Van
Elteren stratified Wilcoxon, use the STRATA statement without alignment.
For example:
proc npar1way data=trial wilcoxon;
class Trt;
strata Center; /*
stratification variable */
var Change;
run;
This computes the stratified Wilcoxon (van Elteren) test and provides both one‐sided and two‐sided p-values. To perform an aligned‐rank test, use the ALIGN=STRATA option (with STRATA) in PROC NPAR1WAY. For example:
proc npar1way data=trial wilcoxon align=strata(hl);class Trt;strata Center;var Change;run;
This subtracts the stratum median (or Hodges–Lehmann shift if (HL) is specified) from each response before ranking, then conducts the Wilcoxon test. The output will include the Hodges‐Lehmann estimate and CI for the location shift. Thus, PROC NPAR1WAY with the STRATA and ALIGN=STRATA options directly implements the aligned‐rank stratified Wilcoxon test. (By default, RANKS=STRATUM is used and weights are by stratum, yielding van Elteren.)
Notice that for both RANK ANCOVA and Aligned Rank Stratified Wilcoxon test, only p-value will be obtained. The treatment difference (so called the difference in medians or location shift in medians) needs to be calculated using Hodges-Lehman estimator.
Side-by-Side Comparison of RANK ANCOVA and Aligned Rank Stratified Wilcoxon
The table below summarizes these points:
Rank ANCOVA (Rank‐transform ANCOVA) |
Aligned‐Rank Stratified Wilcoxon |
|
Null hypothesis |
H₀: no treatment effect on the ranked outcome (treatment
coeff=0 in rank‐scale model). |
H₀: no treatment effect (no location shift) in stratified
model (τ1=τ2=0\tau_1=\tau_2=0). |
Model / Assumptions |
Linear model on ranks: assume covariate
effects are additive and linear on the rank‐scale, with equal slopes across
groups. (No distributional form assumed beyond this.) |
Additive strata model: Y=μ+βi+τjY=\mu+\beta_i+\tau_j.
Assume observations are exchangeable within strata after alignment.
Does not assume specific distribution shape. |
Test statistic |
ANCOVA F or t on the ranked outcome (i.e. usual parametric
test applied to ranks). |
Wilcoxon rank‐sum on aligned data.
(Equivalently, stratified Wilcoxon/Van Elteren statistic.) |
Distribution |
Uses large‐sample normal/chi‐square
approximations (from GLM on ranks). No exact test available in SAS for this. |
Asymptotic normal (z) or exact (via permutation)
available. SAS PROC NPAR1WAY can compute exact Wilcoxon p-values
within strata. |
Outcome type |
Continuous or ordinal outcomes. Can include multiple
continuous covariates in model. |
Continuous or ordinal outcomes. Requires (or creates)
strata: typically categorical covariates (e.g. randomization strata) or
aligned by regression. |
Robustness / Outliers |
Not fully nonparametric. Robust to outliers and non‐normality (rank‐based).
However, if covariate–treatment interactions exist or slope equality fails,
Type I error can inflate. |
Fully nonparametric. Robust to outliers (rank‐based)
and handles non‐normal/heteroscedastic data well. Alignment removes
nuisance location shifts. |
Small sample |
Relies on asymptotic ANCOVA on ranks; no built‐in
exact test. May be liberal if sample is very small or distribution very
discrete. |
PROC NPAR1WAY can use exact Wilcoxon (by strata) for small
N. Alignment in tiny samples may have less stable Type I. |
Advantages |
Easy to implement via standard ANCOVA tools. Uses full
continuous covariate information. Retains high power under model correctness.
Covariate adjustment usually reduces variance (narrower CIs). |
Nonparametric (distribution‐free) test. Directly yields
Hodges–Lehmann shift estimate and CI. Naturally incorporates
stratification/blocks (van Elteren). Valid under mild assumptions. |
Disadvantages |
Tests on ranks, not raw scale, so
interpretation of effect size is not straightforward. Not a test of medians
in original units. Can mislead if model assumptions fail. No simple SAS proc
– must manually rank or regress. |
Must predefine strata or alignment model. Less flexible
for multiple continuous covariates (usually one strata factor). Alignment
choice (median vs mean) can affect results. In small samples, alignment may
behave poorly. |
SAS implementation |
No single procedure. Typically use PROC RANK to create
ranked Y, then PROC GLM (or PROC REG) with covariates on ranks.
Alternatively, regress Y on covariate (PROC REG/ROBUSTREG), rank the
residuals, and test group difference. (All manual steps.) |
Use PROC NPAR1WAY. For stratified Wilcoxon: STRATA statement (no ALIGN) yields Van Elteren test. For aligned‐rank: add ALIGN=STRATA (and optionally (HL) or (MEAN) option) in PROC statement. E.g. proc npar1way data=… wilcoxon align=strata(hl); class Trt; strata Covar; var Y; run;. |
- Hoeper et al (2023) "Phase 3 Trial of Sotatercept for Treatment of Pulmonary Arterial Hypertension" and details in the attached protocol and SAP - Aligned Rank Stratified Wilcoxon was used.
- Statistical Analysis Plan for "A Phase 2b, Dose-Ranging, Randomized, Double-Blind, Placebo Controlled, Multicenter Study of Rodatristat ethyl in Patients with Pulmonary Arterial Hypertension" - Aligned Rank Stratified Wilcoxon was used.
- Van Dyck et al (2022) "Lecanemab in Early Alzheimer’s Disease" and details in the attached protocol and SAP - rank analysis of covariance was used for sensitivity analysis to assess the robustness of the primary efficacy analysis using MMRM
- NDA 22,535/0045 Pirfenidone in the treatment of IPF - Rank ANCOVA was used
- NDA 207533/ O-1 ARISTADA for Treatment of Schizophrenia - Rank ANCOVA was used