p-value

Author

Prof. Eric A. Suess

Simulation of the Confidence level of a Confidence Interval.

Suppose we are sampling from a normal distribution with mean \(\mu\) and standard deviation \(\sigma\). We want to estimate \(\mu\) with a confidence interval. We will use the sample mean \(\bar{X}\) as our estimator. We will use the sample standard deviation \(s\) as our estimator of \(\sigma\). We will use the t-distribution to construct a confidence interval. We will use the following formula to construct a confidence interval:

\[\bar{x} \pm t_{\alpha/2, n-1} \frac{s}{\sqrt{n}}\]

where \(t_{\alpha/2, n-1}\) is the \(\alpha/2\) quantile of the t-distribution with \(n-1\) degrees of freedom.

We will simulate the Confidence level of this Confidence Interval. We will do this by simulating many samples from a normal distribution with mean \(\mu\) and standard deviation \(\sigma\). We will then construct a confidence interval for each sample. We will then count the number of confidence intervals that contain \(\mu\). This will give us an estimate of the confidence level of the confidence interval.

We will use the following parameters:

  • \(\mu = 0\)
  • \(\sigma = 1\)
  • \(\alpha = 0.05\)
  • \(n = 10\)

We will simulate 10000 samples.

We will use the following R code to simulate the confidence level of the confidence interval.

# Confidence level of a confidence interval
# Simulate many samples from a normal distribution
# Construct a confidence interval for each sample
# Count the number of confidence intervals that contain the mean
# This will give us an estimate of the confidence level of the confidence interval

# Parameters
mu <- 0
sigma <- 1
alpha <- 0.05
n <- 10
n.sim <- 10000

count <- 0
for (i in 1:n.sim) {
  # Simulate a sample from a normal distribution
  x <- rnorm(n, mu, sigma)
  # Construct a confidence interval
  ci <- mean(x) + qt(1 - alpha/2, n - 1) * sd(x) / sqrt(n) * c(-1, 1)
  # Count the number of confidence intervals that contain the mean
  if (ci[1] <= mu && mu <= ci[2]) {
    count <- count + 1
  }
}
  
# Estimate the confidence level of the confidence interval
count / n.sim
[1] 0.9528
# The confidence level of the confidence interval is approximately 0.95

We rewrite the code using the replicate function.

# Confidence level of a confidence interval
# Simulate many samples from a normal distribution
# Construct a confidence interval for each sample
# Count the number of confidence intervals that contain the mean
# This will give us an estimate of the confidence level of the confidence interval

# Parameters
mu <- 0
sigma <- 1
alpha <- 0.05
n <- 10
n.sim <- 10000

count <- sum(replicate(n.sim, {
  # Simulate a sample from a normal distribution
  x <- rnorm(n, mu, sigma)
  # Construct a confidence interval
  ci <- mean(x) + qt(1 - alpha/2, n - 1) * sd(x) / sqrt(n) * c(-1, 1)
  # Count the number of confidence intervals that contain the mean
  if (ci[1] <= mu && mu <= ci[2]) 1 else 0
}))

# Estimate the confidence level of the confidence interval
count / n.sim
[1] 0.9483
# The confidence level of the confidence interval is approximately 0.95

Rewrite the code using the Tidyverse functions and nesting the samples in a list column of a tibble.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.3     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# Confidence level of a confidence interval
# Simulate many samples from a normal distribution
# Construct a confidence interval for each sample
# Count the number of confidence intervals that contain the mean
# This will give us an estimate of the confidence level of the confidence interval

# Parameters
mu <- 0
sigma <- 1
alpha <- 0.05
n <- 10
n.sim <- 10000

# Simulate many samples from a normal distribution
# Construct a confidence interval for each sample
# Count the number of confidence intervals that contain the mean
# This will give us an estimate of the confidence level of the confidence interval

count <- tibble(sim = 1:n.sim) |> 
  mutate(x = map(sim, ~rnorm(n, mu, sigma))) |> 
  mutate(ci = map(x, ~mean(.) + qt(1 - alpha/2, n - 1) * sd(.) / sqrt(n) * c(-1, 1))) |> 
  mutate(count = map_dbl(ci, ~if (.x[1] <= mu && mu <= .x[2]) 1 else 0)) |> 
  summarize(count = sum(count)) |> 
  pull(count)

# Estimate the confidence level of the confidence interval
count / n.sim
[1] 0.9479
# The confidence level of the confidence interval is approximately 0.95

Now we use ggplot to visualize the simulation of the confidence intervals.

library(tidyverse)

# Confidence level of a confidence interval
# Simulate many samples from a normal distribution
# Construct a confidence interval for each sample
# Count the number of confidence intervals that contain the mean
# This will give us an estimate of the confidence level of the confidence interval

# Parameters
mu <- 0
sigma <- 1
alpha <- 0.05
n <- 10
n.sim <- 100

# Simulate many samples from a normal distribution
# Construct a confidence interval for each sample
# Plot the confidence intervals on a plot, connecting the high and low values of the confidence interval

tibble(sim = 1:n.sim) |> 
  mutate(x = map(sim, ~rnorm(n, mu, sigma))) |> 
  mutate(ci = map(x, ~mean(.) + qt(1 - alpha/2, n - 1) * sd(.) / sqrt(n) * c(-1, 1))) |> 
  unnest(ci) |> 
  mutate(sim = factor(sim)) |> 
  ggplot(aes(x = sim, y = ci, group = sim)) +
  geom_line() +
  geom_point() +
  geom_hline(yintercept = mu, color = "red") +
  labs(x = "Simulation", y = "Confidence Interval", title = "Confidence Intervals for the Mean") +
  theme_bw()

The confidence level of the confidence interval is approximately 0.95

CI Simulation

library(BSDA)
Loading required package: lattice

Attaching package: 'BSDA'
The following object is masked from 'package:datasets':

    Orange
CIsim(100, 30, 100, 10)

7 % of the random confidence intervals do not contain Mu = 100 . 
    # Simulates 100 samples of size 30 from 
    # a normal distribution with mean 100
    # and standard deviation 10.  From the
    # 100 simulated samples, 95% confidence
    # intervals for the Mean are constructed 
    # and depicted in the graph. 

CIsim(100, 30, 100, 10, type="Var")

6 % of the random confidence intervals do not contain Var = 100 . 
    # Simulates 100 samples of size 30 from 
    # a normal distribution with mean 100
    # and standard deviation 10.  From the
    # 100 simulated samples, 95% confidence
    # intervals for the variance are constructed 
    # and depicted in the graph.
    
CIsim(100, 50, .5, type="Pi", conf.level=.90)     

11 % of the random confidence intervals do not contain Pi = 0.5 . 
    # Simulates 100 samples of size 50 from 
    # a binomial distribution where the population
    # proportion of successes is 0.5.  From the
    # 100 simulated samples, 90% confidence
    # intervals for Pi are constructed 
    # and depicted in the graph.