p-value

Author

Prof. Eric A. Suess

The distribution of the p-value under the null hypothesis.

Suppose we are sampling from a Normal population with mean \(\mu_0 = 50\) and standard deviation \(\sigma = 10\). We will sample 100 observations from this population and test the null hypothesis that \(\mu = 50\) against the alternative hypothesis that \(\mu \neq 50\). We will use a significance level of \(\alpha = 0.05\).

set.seed(1234)
x <- rnorm(100, 50, 10)
t.test(x, mu = 50)

    One Sample t-test

data:  x
t = -1.5607, df = 99, p-value = 0.1218
alternative hypothesis: true mean is not equal to 50
95 percent confidence interval:
 46.43942 50.42534
sample estimates:
mean of x 
 48.43238 

We note that the p-value for the test can be accessed from the output directly.

t.test(x, mu = 50)$p.value
[1] 0.1217758

Now will will repeat this experiment 10000 times and record the p-value for each experiment. We examine the distribution of the p-value under the null hypothesis, \(H_0: \mu = 50\).

Answer: The p-value is uniformly distributed under the null hypothesis.

p <- replicate(10000, {
  x <- rnorm(100, 50, 10)
  t.test(x, mu = 50)$p.value
})

hist(p, prob = TRUE, breaks = 20)
lines(density(p))
curve(dunif(x), add = TRUE, col = "red")

Now will will repeat this experiment 1000 times, changing the value of \(\mu_1\) to 75, and record the p-value for each experiment. We examine the distribution of the p-value under the null hypothesis, \(H_0: \mu = 50\).

For different values of \(\mu_1\), the p-value is not uniformly distributed under the null hypothesis.

Answer: The p-value is not uniformly distributed under the alternative hypothesis.

p <- replicate(10000, {
  x <- rnorm(100, 53, 10)   # mu_1 = 53
  t.test(x, mu = 50)$p.value
})

hist(p, prob = TRUE, breaks = 20)
lines(density(p))
curve(dunif(x), add = TRUE, col = "red")