---
title: "Understanding Confidence Intervals"
author: "Gemini"
format: pdf
---

## Introduction

This notebook demonstrates the concept of "confidence" in confidence intervals. We will generate many samples from a known population, compute a 95% confidence interval for the population mean for each sample, and then visualize these intervals.

## Simulation

The following R code performs the simulation.

````{r}
#| label: confidence-simulation
#| echo: true
#| warning: false
#| message: false

library(tidyverse)

# --- 1. Set Parameters ---
set.seed(123)
population_mean <- 100
population_sd <- 15
sample_size <- 30
n_samples <- 100
confidence_level <- 0.95

# --- 2. Generate Samples and Compute Confidence Intervals ---
samples_data <- replicate(n_samples, rnorm(sample_size, mean = population_mean, sd = population_sd), simplify = FALSE)

ci_data <- samples_data |>
  map_dfr(~{
    sample_mean <- mean(.x)
    se <- sd(.x) / sqrt(sample_size)
    margin_error <- qt(1 - (1 - confidence_level) / 2, df = sample_size - 1) * se
    tibble(
      lower = sample_mean - margin_error,
      upper = sample_mean + margin_error,
      sample_mean = sample_mean
    )
  }) |>
  mutate(
    sample_num = 1:n_samples,
    contains_mean = lower <= population_mean & upper >= population_mean
  )

# --- 3. Plot the Confidence Intervals ---
ci_plot <- ggplot(ci_data, aes(x = factor(sample_num), ymin = lower, ymax = upper, color = contains_mean)) +
  geom_errorbar(width = 0.5) +
  geom_hline(yintercept = population_mean, color = "red", linetype = "dashed") +
  coord_flip() +
  labs(
    title = "100 Confidence Intervals for the Population Mean",
    x = "Sample Number",
    y = "Confidence Interval",
    color = "Contains Population Mean"
  ) +
  theme_minimal() +
  theme(axis.text.y = element_blank(), axis.ticks.y = element_blank())

print(ci_plot)

# --- 4. Count Intervals Containing the Mean ---
X <- sum(ci_data$contains_mean)
cat("Number of confidence intervals containing the population mean (X):", X, "\n")

# --- 5. Estimate Observed Confidence Level ---
observed_confidence_level <- X / n_samples
cat("Observed level of confidence:", observed_confidence_level, "\n")

# --- 6. Compute 95% CI for the Confidence Level ---
confidence_interval_for_level <- binom.test(X, n_samples, p = confidence_level)
cat("95% confidence interval for the level of confidence:\n")
print(confidence_interval_for_level$conf.int)
````

## Law of Large Numbers for Confidence Levels

To see the Law of Large Numbers in action, we can run the simulation for a much larger number of trials. The following code runs the simulation 1,000 times and plots the convergence of the observed confidence level to the theoretical level of 95%. The shaded blue area represents the 95% confidence interval for the observed confidence level, which gets narrower as the number of simulations increases.

```{r}
#| label: lln-confidence-level
#| echo: true

set.seed(789)
n_sims_lln <- 1000

# Run the simulation n_sims_lln times
lln_results <- map_lgl(1:n_sims_lln, ~{
  sample_data <- rnorm(sample_size, mean = population_mean, sd = population_sd)
  sample_mean <- mean(sample_data)
  se <- sd(sample_data) / sqrt(sample_size)
  margin_error <- qt(1 - (1 - confidence_level) / 2, df = sample_size - 1) * se
  lower_bound <- sample_mean - margin_error
  upper_bound <- sample_mean + margin_error
  
  # Check if the true mean is in the interval
  lower_bound <= population_mean & upper_bound >= population_mean
})

# Calculate cumulative confidence level and its confidence interval
lln_convergence_data <- tibble(
  simulation_num = 1:n_sims_lln,
  success = cumsum(lln_results),
  cumulative_confidence_level = success / simulation_num
) |>
  rowwise() |>
  mutate(
    ci_lower = binom.test(success, simulation_num)$conf.int[1],
    ci_upper = binom.test(success, simulation_num)$conf.int[2]
  ) |>
  ungroup()


# Plot the convergence
ggplot(lln_convergence_data, aes(x = simulation_num, y = cumulative_confidence_level)) +
  geom_line(color = "blue") +
  geom_ribbon(aes(ymin = ci_lower, ymax = ci_upper), alpha = 0.2, fill = "blue") +
  geom_hline(yintercept = confidence_level, color = "red", linetype = "dashed") +
  labs(
    title = "Convergence of Observed Confidence Level to 95%",
    subtitle = "Based on 1000 Simulations",
    x = "Number of Simulated Confidence Intervals",
    y = "Observed Confidence Level (Cumulative)"
  ) +
  ylim(0.85, 1.0) + # Zoom in on the convergence
  theme_minimal()
```