--- title: "Stat. 316: Sampling" author: "Prof. Eric A. Suess" date: "3/20/2024" format: html: self-contained: true --- # Central Limit Theorem (CLT) The CLT states that when taking a random sample $X_1, X_2, ..., X_n$ from any population with population mean $\mu$ and population standard deviation $\sigma$, then sample mean $\bar{X}$ is Normally distributed with mean $\mu$ and standard deviation $\sigma/ \sqrt{n}$. Or in repeated sampling the z-score is distributed standard normal. $$ Z = \frac{\bar{X} - \mu}{\sigma/ \sqrt{n}} $$ Simulation: Suppose we repeatedly take a samples of size $n = 12$ from a Normal population with mean $\mu = 1$ and standard deviation $\sigma = 3$. ```{r} B <- 10000 n <- 12 mu <- 1 sigma <- 3 Z <- replicate(B, { x <- rnorm(n, mu, sigma) # I need mu to simulate the data Xbar <- mean(x) # Now assume I do not know mu (Xbar - mu) / (sigma/sqrt(n)) }) hist(Z) plot(density(Z), main = "Standardized mean of 12 normal rvs", xlab = "Z" ) curve(dnorm(x), add = TRUE, col = "red") ``` # T distribution, Sampling Distribution Substitute the sample standard deviation for the population standard deviation. Simulation: Suppose we repeatedly take a samples of size $n = 12$ from a Normal population with mean $\mu = 1$ and standard deviation $\sigma = 3$. Or in repeated sampling the z-score is distributed standard normal. $$ T = \frac{\bar{X} - \mu}{S/ \sqrt{n}} $$ ```{r} B <- 10000 n <- 12 mu <- 1 sigma <- 3 T <- replicate(B, { x <- rnorm(n, mu, sigma) Xbar <- mean(x) Xsd <- sd(x) SE <- Xsd / sqrt(n) (Xbar - mu) / SE }) hist(T) plot(density(T), main = "Standardized mean of 12 normal rvs", xlab = "T") curve(dt(x, df = n-1), add = TRUE, col = "red") ```