Introduction XII -Order Statistics

We’ve used order statistics already when we covered auction theory but some of you might not have known what order statistics are. I hope I can enlighten those today.

Order Statistics

What are Order Statistics?

Order statistics are what the name says. They are the probability that a given value of i.i.d random variables is the nth smallest value in all our “draws”. Therefore the maximum, the minimum and the median of i.i.d random variables are all order statistics.  Or more general:

Suppose $X_{ 1 }, X_{ 2 }, ... X_{ n }$ are idependent, identically-distributed (i.i.d) random variables with PDF f(x) and CDF F(x). The order statistics then says:

$X_{ (1) } \le X_{ (2) } \le ... \le X_{ (n) }$

where sub (n) means it is the order statistic.

An important note here is that the random variables are independent of each another but their order statistics aren’t, as the 1st order statistic will always be less or equal to the 2nd order statistic. The value of the 2nd order statistic gives us therefore some limitation in what range the 1st order statitic can be.

CDF

We start with finding the CDF of order statistics as it is easier than finding the pdf.

We want to find the probability that at least j of $X_{ i }$‘s are $\le$ x. In mathematical terms it would look like this:

$P(X_{ j } \le x)=P(at\;least\;j\;of\;X_{ i }\;are\le x)$

If we reformulate it to exact j of $X_{ i }$ are $\le$ x, it is binomial distributed as there are just two outcomes. Either it is smaller or it is bigger than x.

We than get the following equation:

$P(X_{ j }\le x)=\sum_{ k=j }^{ n }{ (\begin{matrix} n \\ k \end{matrix})F(x)^{ k }(1-F(x))^{ n-k } }$

Example: In the auction theory post we used the nth order statistic cdf for our posted price model. We had a Uniform[0,1] distribution.

$P(X_{ n }\le x)=\sum_{ k=n }^{ n }{ (\begin{matrix} n \\ n \end{matrix})F(x)^{ n }(1-F(x))^{ n-n } }=F(x)^{ n }=x^{ n }$

The nth order statistic is by the way the maximum.

PDF

We could for sure just differentiate the CDF but the sum makes it quite difficult so we go an easier way.

The trick here is to remember that the probability is given by f(x)dx where dx is a tiny interval around x. If we have n i.i.d random variables, we have n possibilities for the jth order statistic each with probability f(x)dx, because every value could be the jth order statistic. We therefore have:

$f_{ X_{ (j) }}(x)dx=nf(x)dx...$

The ramaining n-1 can then be either bigger or smaller than x. We therefore have again a binomial distribution. We know that j-1 have to be smaller because otherwise the jth order statistic wouldn’t be the jth order statistic (it just is how it is). n-j can therefore be bigger than x. Plugging all of this in the equation:

$f_{ X_{ (j) }}(x)dx=nf(x)dx(\begin{matrix} n-1 \\ j-1 \end{matrix})F(x)^{ j-1 }(1-F(x))^{ n-j }$

and therefore:

$f_{ X_{ (j) }}(x)=nf(x)(\begin{matrix} n-1 \\ j-1 \end{matrix})F(x)^{ j-1 }(1-F(x))^{ n-j }$

Example: In the auction theory post we used the n-1st order statistic for the expected profit of auctions. The random variables were i.i.d Uniform[0,1] distributed.

$f_{ X_{ (n-1) }}(x)=nf(x)(\begin{matrix} n-1 \\ (n-1)-1 \end{matrix})F(x)^{ (n-1)-1 }(1-F(x))^{ n-(n-1) }=nf(x)(n-1)F(x)^{ n-2 }(1-F(x))$

As f(x)=1 and F(x)=x for Uniform[0,1] we get:

$f_{ X_{ (n-1) }}(x)=n(n-1)x^{ n-2 }(1-x)$

What is exactly the same as in the auction theory post.