---
title: "Order Statistics — Quarto Notebook"
format:
  html:
    code-fold: true
    toc: true
    toc-depth: 3
    embed-resources: true
  pdf:
    pdf-engine: pdflatex
    code-fold: true
    toc: true
    toc-depth: 3
    embed-resources: true
    includes:
    - \usepackage{amsmath}
    - \usepackage{amsfonts}
    - \usepackage{amssymb}
---

# Setup

```{r}
#| label: setup
#| message: false
#| warning: false
library(tidyverse)
```

# Order statistics

When ordering a collection of independent continuous random variables with common cdf $F$ (and pdf $f$), let $X_1, X_2, \ldots, X_n \stackrel{iid}{\sim} F$ with pdf $f$. Assume $f(x) \ge 0$ for $x\in(a,b)$ and $0$ otherwise. We define the order statistics as

$X_{(1)} = \min\{X_1,\ldots, X_n\}$

$X_{(n)} = \max\{X_1,\ldots, X_n\}$

## CDFs and PDFs of the minimum and maximum

For the minimum $X_{(1)}$:

$F_{X_{(1)}}(x_1) = \mathbb{P}(X_{(1)} \le x_1) = 1 - \mathbb{P}(X_{(1)} > x_1)$

$= 1 - \mathbb{P}(\text{all } X_i > x_1)$

$= 1 - \big[\mathbb{P}(X_1 > x_1)\,\mathbb{P}(X_2 > x_1)\cdots \mathbb{P}(X_n > x_1)\big]$

$= 1 - \prod_{i=1}^{n}\mathbb{P}(X_i > x_1)$

$= 1 - \prod_{i=1}^{n}\big(1 - F(x_1)\big)$

$= 1 - \big[1 - F(x_1)\big]^n\,\mathbb{I}_{(a,b)}(x_1)$

Differentiating,

$f_{X_{(1)}}(x_1) = n f(x_1)\,[1 - F(x_1)]^{n-1}\,\mathbb{I}_{(a,b)}(x_1)$

For the maximum $X_{(n)}$:

$F_{X_{(n)}}(x_n) = \mathbb{P}(X_{(n)} \le x_n) = \mathbb{P}(\text{all } X_i \le x_n)$

$= \prod_{i=1}^{n}\mathbb{P}(X_i \le x_n)$

$= \big[F(x_n)\big]^n\,\mathbb{I}_{(a,b)}(x_n)$

$f_{X_{(n)}}(x_n) = n\,[F(x_n)]^{n-1} f(x_n)\,\mathbb{I}_{(a,b)}(x_n)$

## Example: Uniform$(0,1)$

Suppose we have a random sample of $n$ Uniform values, $X_1, X_2, ..., X_n \sim U(0,1)$.

$f(x) = \mathbb{I}_{(0,1)}(x)$

$F(x) = \begin{cases} 0, & x<0 \\ x, & 0 \le x \le 1 \\ 1, & x>1 \end{cases}$

Then

$f_{X_{(1)}}(x_1) = n(1 - x_1)^{n-1}\,\mathbb{I}_{(0,1)}(x_1)$

$f_{X_{(n)}}(x_n) = n x_n^{\,n-1}\,\mathbb{I}_{(0,1)}(x_n)$

## Example: Exponential$(\lambda)$

Suppose we have a random sample of $n$ Exponential random variables, $X_1, X_2, ..., X_n \sim \text{Exp}(\lambda)$.

$f(x) = \lambda e^{-\lambda x}\,\mathbb{I}_{(0,\infty)}(x)$

$F(x) = (1 - e^{-\lambda x})\,\mathbb{I}_{(0,\infty)}(x)$

So

$f_{X_{(1)}}(x_1) = n\lambda e^{-n\lambda x_1}\,\mathbb{I}_{(0,\infty)}(x_1)$

$\Rightarrow\; X_{(1)} \sim \mathrm{Exp}(n\lambda)$

$f_{X_{(n)}}(x_n) = n\,[1 - e^{-\lambda x_n}]^{n-1}\,e^{-\lambda x_n}\,\mathbb{I}_{(0,\infty)}(x_n)$

An immediate application (with $t>0$):

$\mathbb{P}(X_{(1)} > t) = e^{-n\lambda t}\,\mathbb{I}_{(0,\infty)}(t)$

## General $k$th order statistic

$f_{X_{(k)}}(x_k) = \dfrac{n!}{(k-1)!(n-k)!}\,[F(x_k)]^{k-1}\,f(x_k)\,[1 - F(x_k)]^{n-k}\,\mathbb{I}_{(a,b)}(x_k)$

### Joint density of the $j$th and $k$th order statistics $(j<k)$

$f_{X_{(j)},X_{(k)}}(x_j, x_k) = \dfrac{n!}{(j-1)!(k-j-1)!(n-k)!}\,[F(x_j)]^{j-1} f(x_j)$

$\qquad\times\,[F(x_k) - F(x_j)]^{k-j-1} f(x_k)\,[1 - F(x_k)]^{n-k}\,\mathbb{I}_{(a,b)}(x_j)\,\mathbb{I}_{(x_j,b)}(x_k)$

### Example: Uniform$(0,1)$ again

$f_{X_{(k)}}(x_k) = \dfrac{n!}{(k-1)!(n-k)!}\,x_k^{\,k-1}(1 - x_k)^{n-k}\,\mathbb{I}_{(0,1)}(x_k)$

$\Rightarrow\; X_{(k)} \sim \mathrm{Beta}\big(k,\, n-k+1\big)$

### Joint density of the minimum and maximum for Uniform$(0,1)$

$f_{X_{(1)},X_{(n)}}(x_1, x_n) = n(n-1)\,(x_n - x_1)^{n-2}\,\mathbb{I}_{(0,1)}(x_1)\,\mathbb{I}_{(x_1,1)}(x_n)$

# R helpers (optional)

The following chunk uses tidyverse style to simulate samples and empirically verify the formulas.

```{r}
#| label: simulate-unif
set.seed(1)
n <- 5
m <- 100000

df <- tibble(id = 1:m) %>%
  mutate(samples = map(id, ~ runif(n))) %>%
  mutate(mins = map_dbl(samples, min),
         maxs = map_dbl(samples, max))

df %>% summarise(mean_min = mean(mins), mean_max = mean(maxs))
```

```{r}
#| label: simulate-exp
set.seed(1)
n <- 5
lambda <- 2
m <- 100000

df <- tibble(id = 1:m) %>%
  mutate(samples = map(id, ~ rexp(n, rate=lambda))) %>%
  mutate(mins = map_dbl(samples, min))

df %>% summarise(mean_min = mean(mins), theoretical = 1/(n*lambda))
```