--- title: "Order Statistics — Quarto Notebook" format: html: code-fold: true toc: true toc-depth: 3 embed-resources: true pdf: pdf-engine: pdflatex code-fold: true toc: true toc-depth: 3 embed-resources: true includes: - \usepackage{amsmath} - \usepackage{amsfonts} - \usepackage{amssymb} --- # Setup ```{r} #| label: setup #| message: false #| warning: false library(tidyverse) ``` # Order statistics When ordering a collection of independent continuous random variables with common cdf $F$ (and pdf $f$), let $X_1, X_2, \ldots, X_n \stackrel{iid}{\sim} F$ with pdf $f$. Assume $f(x) \ge 0$ for $x\in(a,b)$ and $0$ otherwise. We define the order statistics as $X_{(1)} = \min\{X_1,\ldots, X_n\}$ $X_{(n)} = \max\{X_1,\ldots, X_n\}$ ## CDFs and PDFs of the minimum and maximum For the minimum $X_{(1)}$: $F_{X_{(1)}}(x_1) = \mathbb{P}(X_{(1)} \le x_1) = 1 - \mathbb{P}(X_{(1)} > x_1)$ $= 1 - \mathbb{P}(\text{all } X_i > x_1)$ $= 1 - \big[\mathbb{P}(X_1 > x_1)\,\mathbb{P}(X_2 > x_1)\cdots \mathbb{P}(X_n > x_1)\big]$ $= 1 - \prod_{i=1}^{n}\mathbb{P}(X_i > x_1)$ $= 1 - \prod_{i=1}^{n}\big(1 - F(x_1)\big)$ $= 1 - \big[1 - F(x_1)\big]^n\,\mathbb{I}_{(a,b)}(x_1)$ Differentiating, $f_{X_{(1)}}(x_1) = n f(x_1)\,[1 - F(x_1)]^{n-1}\,\mathbb{I}_{(a,b)}(x_1)$ For the maximum $X_{(n)}$: $F_{X_{(n)}}(x_n) = \mathbb{P}(X_{(n)} \le x_n) = \mathbb{P}(\text{all } X_i \le x_n)$ $= \prod_{i=1}^{n}\mathbb{P}(X_i \le x_n)$ $= \big[F(x_n)\big]^n\,\mathbb{I}_{(a,b)}(x_n)$ $f_{X_{(n)}}(x_n) = n\,[F(x_n)]^{n-1} f(x_n)\,\mathbb{I}_{(a,b)}(x_n)$ ## Example: Uniform$(0,1)$ Suppose we have a random sample of $n$ Uniform values, $X_1, X_2, ..., X_n \sim U(0,1)$. $f(x) = \mathbb{I}_{(0,1)}(x)$ $F(x) = \begin{cases} 0, & x<0 \\ x, & 0 \le x \le 1 \\ 1, & x>1 \end{cases}$ Then $f_{X_{(1)}}(x_1) = n(1 - x_1)^{n-1}\,\mathbb{I}_{(0,1)}(x_1)$ $f_{X_{(n)}}(x_n) = n x_n^{\,n-1}\,\mathbb{I}_{(0,1)}(x_n)$ ## Example: Exponential$(\lambda)$ Suppose we have a random sample of $n$ Exponential random variables, $X_1, X_2, ..., X_n \sim \text{Exp}(\lambda)$. $f(x) = \lambda e^{-\lambda x}\,\mathbb{I}_{(0,\infty)}(x)$ $F(x) = (1 - e^{-\lambda x})\,\mathbb{I}_{(0,\infty)}(x)$ So $f_{X_{(1)}}(x_1) = n\lambda e^{-n\lambda x_1}\,\mathbb{I}_{(0,\infty)}(x_1)$ $\Rightarrow\; X_{(1)} \sim \mathrm{Exp}(n\lambda)$ $f_{X_{(n)}}(x_n) = n\,[1 - e^{-\lambda x_n}]^{n-1}\,e^{-\lambda x_n}\,\mathbb{I}_{(0,\infty)}(x_n)$ An immediate application (with $t>0$): $\mathbb{P}(X_{(1)} > t) = e^{-n\lambda t}\,\mathbb{I}_{(0,\infty)}(t)$ ## General $k$th order statistic $f_{X_{(k)}}(x_k) = \dfrac{n!}{(k-1)!(n-k)!}\,[F(x_k)]^{k-1}\,f(x_k)\,[1 - F(x_k)]^{n-k}\,\mathbb{I}_{(a,b)}(x_k)$ ### Joint density of the $j$th and $k$th order statistics $(j% mutate(samples = map(id, ~ runif(n))) %>% mutate(mins = map_dbl(samples, min), maxs = map_dbl(samples, max)) df %>% summarise(mean_min = mean(mins), mean_max = mean(maxs)) ``` ```{r} #| label: simulate-exp set.seed(1) n <- 5 lambda <- 2 m <- 100000 df <- tibble(id = 1:m) %>% mutate(samples = map(id, ~ rexp(n, rate=lambda))) %>% mutate(mins = map_dbl(samples, min)) df %>% summarise(mean_min = mean(mins), theoretical = 1/(n*lambda)) ```