--- title: "p-value" author: "Prof. Eric A. Suess" format: html: self-contained: true --- ## The distribution of the *p-value* under the null hypothesis. Suppose we are sampling from a Normal population with mean $\mu_0 = 50$ and standard deviation $\sigma = 10$. We will sample 100 observations from this population and test the null hypothesis that $\mu = 50$ against the alternative hypothesis that $\mu \neq 50$. We will use a significance level of $\alpha = 0.05$. ```{r} set.seed(1234) x <- rnorm(100, 50, 10) t.test(x, mu = 50) ``` We note that the p-value for the test can be accessed from the output directly. ```{r} t.test(x, mu = 50)$p.value ``` Now will will repeat this experiment 10000 times and record the *p-value* for each experiment. We examine the distribution of the p-value under the null hypothesis, $H_0: \mu = 50$. **Answer:** The p-value is uniformly distributed under the null hypothesis. ```{r} p <- replicate(10000, { x <- rnorm(100, 50, 10) t.test(x, mu = 50)$p.value }) hist(p, prob = TRUE, breaks = 20) lines(density(p)) curve(dunif(x), add = TRUE, col = "red") ``` Now will will repeat this experiment 1000 times, changing the value of $\mu_1$ to 75, and record the *p-value* for each experiment. We examine the distribution of the p-value under the null hypothesis, $H_0: \mu = 50$. For different values of $\mu_1$, the p-value is not uniformly distributed under the null hypothesis. **Answer:** The p-value is **not** uniformly distributed under the alternative hypothesis. ```{r} p <- replicate(10000, { x <- rnorm(100, 53, 10) # mu_1 = 53 t.test(x, mu = 50)$p.value }) hist(p, prob = TRUE, breaks = 20) lines(density(p)) curve(dunif(x), add = TRUE, col = "red") ```