--- title: "Discrete Distributions" author: "Prof. Eric A. Suess" date: "3/20/2024" format: html: self-contained: true --- # Discrete Distributions - Computations and Simulations ## Flip a biased coin Simulate flipping a biased coin 1000 times. The probability of heads is 0.75. ```{r} n <- 1000 pr <- c(.25, .75) flips <- sample(c(0,1), size = n, prob = pr, replace = TRUE) head(flips) ``` Estimate the probability of heads. ```{r} mean(flips == 1) ``` Plot the convergence of the sampling to the probability of heads. ```{r} plot(cumsum(flips)/1:n, type = "l", xlab = "n", ylab = "P(X)") abline(h = .75, col = "red") ``` Plot the estimated probability mass function. ```{r} barplot(table(flips)/n, xlab = "X", ylab = "P(X)", col = "lightblue") ``` ## Binomial Distribution Now use the binomial distribution to simulate flipping a biased fair coin 1000 times. ### Probability Mass Function (PMF) #### Simulation ```{r} n <- 1000 p <- .75 flips <- rbinom(n, size = 1, prob = p) head(flips) ``` Estimate the probability of heads. ```{r} mean(flips == 1) ``` Plot the convergence of the sampling to the probability of heads. ```{r} plot(cumsum(flips)/1:n, type = "l", xlab = "n", ylab = "P(X)") abline(h = .75, col = "red") ``` Plot the estimated probability mass function. ```{r} barplot(table(flips)/n, xlab = "X", ylab = "P(X)", col = "lightblue") ``` True distribution from the binomial distribution. ```{r} x <- 0:1 y <- dbinom(x, size = 1, prob = p) barplot(y, names.arg = x, xlab = "X", ylab = "P(X)", col = "lightblue") ``` Now use the binomial distribution to simulate flipping a biased fair coin 1000 times. Use the random variable $X$ to represent the number of heads in 1000 flips. ```{r} n <- 1000 p <- .75 flips <- rbinom(1, size = 1000, prob = p) flips ``` Now use the binomial distribution to simulate flipping a biased fair coin 1000 times and repeat 10000 times. ```{r} n <- 1000 p <- .75 B <- 10000 flips <- rbinom(B, size = n, prob = p) head(flips) ``` Estimate the probability of 750 or more heads. ```{r} mean(flips <= 750) ``` Estimate the probability mass function. ```{r} hist(flips, breaks = 101, col = "lightblue") ``` True distribution from the binomial distribution. Nice and smooth! ```{r} x <- 700:800 y <- dbinom(x, size = 1000, prob = p) barplot(y, names.arg = x, xlab = "X", ylab = "P(X)", col = "lightblue") ``` Estimated probability of 750 or more heads using the simulated values. ```{r} mean(flips <= 750) ``` ### Cumulative Distribution Function (CDF) Compute the exact probability of 750 or more heads using the Bionomial model. ```{r} pbinom(750, size = 1000, prob = p) ``` ## Poisson Distribution X is the number of accidents in a day. The average number of accidents is 3.9. ### Probability Mass Function (PMF) #### Simulation Simulate the number of accidents in a day 1000 times. ```{r} n <- 1000 lambda <- 3.9 x <- rpois(n, lambda) head(x) ``` Estimate the probability of . ```{r} mean(x <= 3) ``` Plot the convergence to the expected value or mean number of accidents per day. ```{r} plot(cumsum(x)/1:n, type = "l", xlab = "n", ylab = "P(X)") abline(h = ppois(3, lambda), col = "red") ``` Plot the estimated probability mass function. ```{r} barplot(table(x), col = "lightblue") ``` True distribution from the Poisson distribution. Nice and smooth! ```{r} x <- 0:10 y <- dpois(x, lambda) barplot(y, names.arg = x, xlab = "X", ylab = "P(X)", col = "lightblue") ``` ### Cumulative Distribution Function (CDF) Compute the exact probability of 3 or fewer accidents using the Poisson model. ```{r} ppois(3, lambda) ``` Note that the sum of the probabilities of 0, 1, 2, and 3 accidents is the same as the probability of 3 or fewer accidents. ```{r} sum(dpois(0:3, lambda)) ``` ### Poisson Approximation to the Binomial The Poisson distribution is often used to approximate the binomial distribution when the number of trials is large and the probability of success is small. The Poisson distribution is a limit of the binomial distribution as the number of trials goes to infinity and the probability of success goes to zero. The Poisson distribution has a single parameter, the mean, which is the product of the number of trials and the probability of success. The number of trials is 1000 and the probability of success is 0.0039. The mean is 3.9. ```{r} n <- 1000 p <- .0039 lambda <- n * p lambda ``` Simulate the number of accidents in a day 1000 times using the Poisson distribution. ```{r} x <- rpois(1000, lambda) head(x) ``` Estimate the probability of 3 or fewer accidents. ```{r} mean(x <= 3) ``` Binomial approximation to the Poisson probability. ```{r} ppois(3, lambda) pbinom(3, size = 1000, prob = .0039) ``` ## Hypergeometric Distribution ### Probability Mass Function (PMF) #### Simulation Old CA Lotto, 6 numbers from 1 to 51. - m is the number of successes in the population. - n is the number of failures in the population. - k is the number of draws. The number of successes in the sample is the random variable $X$. ```{r} B <- 1000000 x <- rhyper(B, m = 6, n = 45, k = 6) head(x) ``` Estimate the probability of 5 successes. ```{r} mean(x == 0) mean(x == 1) mean(x == 2) mean(x == 3) mean(x == 4) mean(x == 5) mean(x == 6) ``` Plot the estimated probability mass function. ```{r} dhyper(0, m = 6, n = 45, k = 6) dhyper(1, m = 6, n = 45, k = 6) dhyper(2, m = 6, n = 45, k = 6) dhyper(3, m = 6, n = 45, k = 6) dhyper(4, m = 6, n = 45, k = 6) dhyper(5, m = 6, n = 45, k = 6) dhyper(6, m = 6, n = 45, k = 6) ``` Direct Calculation ```{r} choose(6, 0) * choose(45, 6) / choose(51, 6) choose(6, 1) * choose(45, 5) / choose(51, 6) choose(6, 2) * choose(45, 4) / choose(51, 6) choose(6, 3) * choose(45, 3) / choose(51, 6) choose(6, 4) * choose(45, 2) / choose(51, 6) choose(6, 5) * choose(45, 1) / choose(51, 6) choose(6, 6) * choose(45, 0) / choose(51, 6) ``` To win the jackpot, you need to match all 6 numbers. So 1 in 18,009,460. ```{r} choose(51, 6) ``` ## Geometric ### Probability Mass Function (PMF) Simulate number of free throws until the first miss. Curry is 90% from the line. So the probability of missing a free throw is 0.1. ```{r} n <- 1000 q <- 0.1 x <- rgeom(n, q) head(x) ``` ```{r} mean(x) ``` Estimate the probability of missing a free throw before 10 in a row. ```{r} mean(x <= 10) ``` ```{r} barplot(table(x), col = "lightblue") ``` ```{r} plot(cumsum(x)/1:n, type = "l", xlab = "n", ylab = "P(X)") ``` True distribution from the Geometric distribution. Nice and smooth! ```{r} x <- 0:55 y <- dgeom(x, q) barplot(y, names.arg = x, xlab = "X", ylab = "P(X)", col = "lightblue") ``` ### Cumulative Distribution Function (CDF) Compute the exact probability of missing a free throw before 10 in a row using the Geometric model. ```{r} pgeom(10, q) ```