Rank-based methods and U-statistics
Learning objectives
- State the WILCOXON RANK-SUM (Mann-Whitney U) test and its connection to permutation tests
- Compute and interpret the test statistic + Normal approximation p-value
- Recognise the WILCOXON SIGNED-RANK test for paired data
- Understand U-STATISTICS as the general framework: estimating expectations over symmetric kernels
- Compare rank-based methods to t-tests under Normal, skewed, and heavy-tailed data
Rank-based methods replace observations with their RANKS in the combined sample, then compute test statistics on the ranks. Result: methods that are DISTRIBUTION-FREE (the null distribution doesn't depend on the underlying data distribution) and ROBUST to outliers and heavy tails. The price is a small loss of efficiency under Normal data (~5%) — well worth paying for distribution-free coverage.
Wilcoxon rank-sum (Mann-Whitney U)
Two-sample data: from group A, from group B. Pool the observations and rank them. Let be the sum of A's ranks. The Wilcoxon rank-sum statistic is
Under H0 (same distribution), and . For large N, the Normal approximation gives a two-sided p-value via . The equivalent Mann-Whitney U statistic is — same test, different scaling.
Crucially, the test is EXACT in small samples (enumerate the rank assignments under H0) and asymptotically Normal in large samples. Tie correction: when there are tied observations, midranks are used, and the variance formula gets a tie-correction factor.
The link to permutation tests
The Wilcoxon test is EXACTLY a permutation test using the rank-sum statistic. Under exchangeability of group labels, the rank distribution is determined by the ways of assigning A-labels to N ranks. The Normal approximation just summarizes this discrete null distribution. So Wilcoxon is in the permutation-test family (§8.2) — using a clever distribution-free test statistic.
Wilcoxon signed-rank for paired data
For paired observations , compute differences . Rank from smallest to largest. The signed-rank statistic is
Under H0 (symmetric around zero), and . Asymptotically Normal. The signed-rank test is the rank-based analogue of the paired t-test; preferred when the differences are not Normal-distributed.
U-statistics: the general framework
Hoeffding (1948) introduced U-statistics as a general framework for unbiased estimators of population functionals. A U-statistic of degree m is
where is a symmetric kernel. Examples: the sample variance is a U-statistic of degree 2 with kernel ; Kendall's tau and Spearman's rho are U-statistics; the Mann-Whitney U is a U-statistic of degree 2 with kernel . Hoeffding showed all U-statistics are asymptotically Normal with closed-form variances derivable from kernel projection.
Pitman ARE: rank vs t under Normal
The Pitman Asymptotic Relative Efficiency (ARE) of Wilcoxon vs t under Normal data is . Meaning: Wilcoxon needs about 5% more sample size to achieve the same power as t under Normal data. Hodges-Lehmann (1956) proved this is the WORST CASE: under any other distribution, Wilcoxon's ARE is >= 0.864 vs t (always; Hodges-Lehmann lower bound) and frequently MUCH higher (Wilcoxon dominates under skewed/heavy-tailed).
Implication: use rank-based methods as the default. Lose at most 5% efficiency under Normal data; gain dramatic robustness elsewhere.
Kruskal-Wallis: multi-group analogue
For comparing K > 2 groups (multi-group ANOVA analogue), the Kruskal-Wallis test pools all data, ranks, computes a chi-squared-like statistic from sums of ranks per group. Under H0, the statistic follows χ²(K-1) asymptotically. Post-hoc rank tests (Dunn-Sidák, Conover-Iman) handle pairwise comparisons after a significant K-W test.
Hodges-Lehmann estimator
The rank-test counterpart to the t-test's estimator (mean difference): the median of all pairwise differences across . This is the Hodges-Lehmann location estimator, ROBUST to outliers and naturally paired with the Wilcoxon test. Used to report point estimates alongside Wilcoxon p-values.
When NOT to use rank-based methods
- Hypothesis is about means specifically: e.g., regulatory thresholds based on means. Wilcoxon tests stochastic dominance, not mean differences.
- Subgroup analyses: rank tests within tiny subgroups have low power; t-test under Normal assumption is fine if the assumption holds.
- Complex regression with covariates: rank-based extensions (van der Waerden, quantile regression) exist but are more involved; OLS often preferred unless residuals are clearly non-Normal.
Try it
- Start with Normal data, shift 0.70, N = 30. Both t-test and Wilcoxon give similar p-values (around 0.005-0.01). Normal data is the t-test's home turf; Wilcoxon is competitive.
- Switch to Lognormal (skewed). Same shift, same N. Compare the p-values. Wilcoxon's p-value is typically smaller (more power) because the rank ordering separates the groups more cleanly than the noisy mean comparison.
- Switch to Cauchy (heavy-tailed). Re-sample several times. t-test p-values are erratic — a single outlier can dominate the sample variance estimate. Wilcoxon p-values are much more stable.
- Set shift = 0 (true null). Re-sample many times under each distribution. t-test p-values are uniform on [0, 1] under Normal, but Wilcoxon p-values are ALSO uniform — both have correct Type-I error under Normal. Under Cauchy, however, t-test Type-I error can be inflated (heavy tails violate CLT in small N).
- Crank N to 200. Both tests gain power. The ratio of p-values stabilizes — at larger N, both tests reach decisive p-values for moderate shifts in all three shapes.
A scientist tests a small clinical trial (N = 15 per arm) with outcomes that visibly skew right (lognormal-like). Why is Wilcoxon a better default than a t-test for this scenario?
What you now know
Rank-based methods (Wilcoxon rank-sum, signed-rank, Kruskal-Wallis) replace observations with ranks, making them distribution-free and robust. Pitman ARE of Wilcoxon vs t under Normal is 0.955; under any other distribution, Wilcoxon is at least 0.864 and often dominates. Hoeffding's (1948) U-statistic framework gives the general theory. The Hodges-Lehmann estimator pairs robust point estimation with Wilcoxon testing. §8.5 next: kernel density estimation, the nonparametric continuous-distribution estimator.
References
- Wilcoxon, F. (1945). "Individual comparisons by ranking methods." Biometrics Bulletin 1(6), 80–83. (Original.)
- Mann, H.B., Whitney, D.R. (1947). "On a test of whether one of two random variables is stochastically larger than the other." Annals of Mathematical Statistics 18(1), 50–60.
- Hoeffding, W. (1948). "A class of statistics with asymptotically normal distribution." Annals of Mathematical Statistics 19(3), 293–325. (U-statistics.)
- Hodges, J.L., Lehmann, E.L. (1956). "The efficiency of some nonparametric competitors of the t-test." Annals of Mathematical Statistics 27(2), 324–335.
- Lehmann, E.L. (2006). Nonparametrics: Statistical Methods Based on Ranks (revised). Springer.