--- title: "p-value" author: "Prof. Eric A. Suess" format: html: self-contained: true --- ## Simulation of the Confidence level of a Confidence Interval. Suppose we are sampling from a normal distribution with mean $\mu$ and standard deviation $\sigma$. We want to estimate $\mu$ with a confidence interval. We will use the sample mean $\bar{X}$ as our estimator. We will use the sample standard deviation $s$ as our estimator of $\sigma$. We will use the t-distribution to construct a confidence interval. We will use the following formula to construct a confidence interval: $$\bar{x} \pm t_{\alpha/2, n-1} \frac{s}{\sqrt{n}}$$ where $t_{\alpha/2, n-1}$ is the $\alpha/2$ quantile of the t-distribution with $n-1$ degrees of freedom. We will simulate the Confidence level of this Confidence Interval. We will do this by simulating many samples from a normal distribution with mean $\mu$ and standard deviation $\sigma$. We will then construct a confidence interval for each sample. We will then count the number of confidence intervals that contain $\mu$. This will give us an estimate of the confidence level of the confidence interval. We will use the following parameters: * $\mu = 0$ * $\sigma = 1$ * $\alpha = 0.05$ * $n = 10$ We will simulate 10000 samples. We will use the following R code to simulate the confidence level of the confidence interval. ```{r} # Confidence level of a confidence interval # Simulate many samples from a normal distribution # Construct a confidence interval for each sample # Count the number of confidence intervals that contain the mean # This will give us an estimate of the confidence level of the confidence interval # Parameters mu <- 0 sigma <- 1 alpha <- 0.05 n <- 10 n.sim <- 10000 count <- 0 for (i in 1:n.sim) { # Simulate a sample from a normal distribution x <- rnorm(n, mu, sigma) # Construct a confidence interval ci <- mean(x) + qt(1 - alpha/2, n - 1) * sd(x) / sqrt(n) * c(-1, 1) # Count the number of confidence intervals that contain the mean if (ci[1] <= mu && mu <= ci[2]) { count <- count + 1 } } # Estimate the confidence level of the confidence interval count / n.sim # The confidence level of the confidence interval is approximately 0.95 ``` We rewrite the code using the `replicate` function. ```{r} # Confidence level of a confidence interval # Simulate many samples from a normal distribution # Construct a confidence interval for each sample # Count the number of confidence intervals that contain the mean # This will give us an estimate of the confidence level of the confidence interval # Parameters mu <- 0 sigma <- 1 alpha <- 0.05 n <- 10 n.sim <- 10000 count <- sum(replicate(n.sim, { # Simulate a sample from a normal distribution x <- rnorm(n, mu, sigma) # Construct a confidence interval ci <- mean(x) + qt(1 - alpha/2, n - 1) * sd(x) / sqrt(n) * c(-1, 1) # Count the number of confidence intervals that contain the mean if (ci[1] <= mu && mu <= ci[2]) 1 else 0 })) # Estimate the confidence level of the confidence interval count / n.sim # The confidence level of the confidence interval is approximately 0.95 ``` Rewrite the code using the Tidyverse functions and nesting the samples in a list column of a tibble. ```{r} library(tidyverse) # Confidence level of a confidence interval # Simulate many samples from a normal distribution # Construct a confidence interval for each sample # Count the number of confidence intervals that contain the mean # This will give us an estimate of the confidence level of the confidence interval # Parameters mu <- 0 sigma <- 1 alpha <- 0.05 n <- 10 n.sim <- 10000 # Simulate many samples from a normal distribution # Construct a confidence interval for each sample # Count the number of confidence intervals that contain the mean # This will give us an estimate of the confidence level of the confidence interval count <- tibble(sim = 1:n.sim) |> mutate(x = map(sim, ~rnorm(n, mu, sigma))) |> mutate(ci = map(x, ~mean(.) + qt(1 - alpha/2, n - 1) * sd(.) / sqrt(n) * c(-1, 1))) |> mutate(count = map_dbl(ci, ~if (.x[1] <= mu && mu <= .x[2]) 1 else 0)) |> summarize(count = sum(count)) |> pull(count) # Estimate the confidence level of the confidence interval count / n.sim # The confidence level of the confidence interval is approximately 0.95 ``` Now we use ggplot to visualize the simulation of the confidence intervals. ```{r} library(tidyverse) # Confidence level of a confidence interval # Simulate many samples from a normal distribution # Construct a confidence interval for each sample # Count the number of confidence intervals that contain the mean # This will give us an estimate of the confidence level of the confidence interval # Parameters mu <- 0 sigma <- 1 alpha <- 0.05 n <- 10 n.sim <- 100 # Simulate many samples from a normal distribution # Construct a confidence interval for each sample # Plot the confidence intervals on a plot, connecting the high and low values of the confidence interval tibble(sim = 1:n.sim) |> mutate(x = map(sim, ~rnorm(n, mu, sigma))) |> mutate(ci = map(x, ~mean(.) + qt(1 - alpha/2, n - 1) * sd(.) / sqrt(n) * c(-1, 1))) |> unnest(ci) |> mutate(sim = factor(sim)) |> ggplot(aes(x = sim, y = ci, group = sim)) + geom_line() + geom_point() + geom_hline(yintercept = mu, color = "red") + labs(x = "Simulation", y = "Confidence Interval", title = "Confidence Intervals for the Mean") + theme_bw() ``` # The confidence level of the confidence interval is approximately 0.95 ## CI Simulation ```{r} library(BSDA) ``` ```{r} CIsim(100, 30, 100, 10) # Simulates 100 samples of size 30 from # a normal distribution with mean 100 # and standard deviation 10. From the # 100 simulated samples, 95% confidence # intervals for the Mean are constructed # and depicted in the graph. CIsim(100, 30, 100, 10, type="Var") # Simulates 100 samples of size 30 from # a normal distribution with mean 100 # and standard deviation 10. From the # 100 simulated samples, 95% confidence # intervals for the variance are constructed # and depicted in the graph. CIsim(100, 50, .5, type="Pi", conf.level=.90) # Simulates 100 samples of size 50 from # a binomial distribution where the population # proportion of successes is 0.5. From the # 100 simulated samples, 90% confidence # intervals for Pi are constructed # and depicted in the graph. ```