p-value

Author

Prof. Eric A. Suess

The distribution of the p-value under the null hypothesis.

Suppose we are sampling from a Normal population with mean μ0=50 and standard deviation σ=10. We will sample 100 observations from this population and test the null hypothesis that μ=50 against the alternative hypothesis that μ50. We will use a significance level of α=0.05.

set.seed(1234)
x <- rnorm(100, 50, 10)
t.test(x, mu = 50)

    One Sample t-test

data:  x
t = -1.5607, df = 99, p-value = 0.1218
alternative hypothesis: true mean is not equal to 50
95 percent confidence interval:
 46.43942 50.42534
sample estimates:
mean of x 
 48.43238 

We note that the p-value for the test can be accessed from the output directly.

t.test(x, mu = 50)$p.value
[1] 0.1217758

Now will will repeat this experiment 10000 times and record the p-value for each experiment. We examine the distribution of the p-value under the null hypothesis, H0:μ=50.

Answer: The p-value is uniformly distributed under the null hypothesis.

p <- replicate(10000, {
  x <- rnorm(100, 50, 10)
  t.test(x, mu = 50)$p.value
})

hist(p, prob = TRUE, breaks = 20)
lines(density(p))
curve(dunif(x), add = TRUE, col = "red")

Now will will repeat this experiment 1000 times, changing the value of μ1 to 75, and record the p-value for each experiment. We examine the distribution of the p-value under the null hypothesis, H0:μ=50.

For different values of μ1, the p-value is not uniformly distributed under the null hypothesis.

Answer: The p-value is not uniformly distributed under the alternative hypothesis.

p <- replicate(10000, {
  x <- rnorm(100, 53, 10)   # mu_1 = 53
  t.test(x, mu = 50)$p.value
})

hist(p, prob = TRUE, breaks = 20)
lines(density(p))
curve(dunif(x), add = TRUE, col = "red")