Kendall Tau Rank Correlation Coefficient

By Allan Roberts

Statistical correlation may or may not be an easy concept to grasp. A typical stats textbook might show you clouds of data points, attach numbers to them, and suggest that you should try to get the general idea; it’s either that or work through the algebra. You can also resort to wordy explanations such as this: If X and Y are positively correlated, X will tend to increase when Y increases, and vice versa. Note that this last statement sounds a lot like a definition based on probabilities. Wouldn’t it be nice if a statistical correlation coefficient had a simple probabilistic interpretation? Such a coefficient exists. It’s called the Kendall Tau Rank Correlation Coefficient. Figure 1. Lines with a positive slope are indicated in red; lines with a negative slope are indicated in blue. The Kendall Tau Rank Correlation Coefficient can be computed from the number of upward sloping versus downward sloping lines, as indicated by the equation in the figure. A figure like this one can be produced using the R Script given below (the long version).

Dividing the number of line segments with positive slope in Figure 1 by the total number of line segments would yield a number between 0 and 1; this proportion can be interpreted as the probability that a randomly selected line segment will have positive slope. Re-scaling this proportion, with the equation given in Figure 1, yields a number between -1 and 1 that is the correlation coefficient tau.

Reference

Kendall Tau Rank Correlation Coefficient. Wikipedia:
wikipedia.org/wiki/Kendall_tau_rank_correlation_coefficient

R Script (Short Version)

#We want to write method = “kendall” with straight quotation marks, which the
#settings for this webpage seem to not allow. Thus the convoluted code that follows.
#In R, names(kendall) will be interpreted as “kendall” with straight quotes.

kendall = numeric(0); kendall = as.data.frame(kendall); #To avoid quotation marks.
X <- sample(20,20); Y <- sample(20,20); cor(X, Y, method = names(kendall) );

R Script (Long Version)

#written by Allan Roberts, Feb 2013.
KendallExample <- function(n=20){
X <- sample(n,n);
Y <- sample(n,n);
plot(X,Y, las=1,xlim=c(0,n+4),ylim=c(0,n+4));

A <- matrix(0,n,n);
for (i in 1:n) for (j in (1:n)[-i]) if ( ((Y[j]-Y[i])/(X[j]-X[i]))>0) A[i,j] <-    1;
for (i in 1:n) for (j in (1:n)[-i]) if ( ((Y[j]-Y[i])/(X[j]-X[i]))<0) A[i,j] <-   -1;

for (i in 1:n) for (j in (1:n)[-i]){
if (A[i,j]== 1){col= 2;   lty=1};
if (A[i,j]==-1){col= 4; lty=3};
if (A[i,j] != 0) lines(c(X[i],X[j]),c(Y[i],Y[j]),col=col,lty=lty);
}

up <- (sum(A>0)/2);
down <- (sum(A<0)/2);
tau <- 2*(up-down)/(n*n-n);

text(n-4,n+4,  paste(expression(Upward), up ));
text(n-4,n+3,paste(expression(Downward), down ));
text(n-4,n+1, expression( frac( 2*(up-down), n^2-n )) );
text(n-2,n+1, adj=c(0,0.5), paste(rawToChar(as.raw(61)),round(tau,digits=3)) );
}

KendallExample(n=10);