Discrete Distributions

Author

Prof. Eric A. Suess

Published

March 20, 2024

Discrete Distributions - Computations and Simulations

Flip a biased coin

Simulate flipping a biased coin 1000 times. The probability of heads is 0.75.

n <- 1000

pr <- c(.25, .75)

flips <- sample(c(0,1), size = n, prob = pr, replace = TRUE)
head(flips)
[1] 1 1 1 1 1 0

Estimate the probability of heads.

mean(flips == 1)
[1] 0.739

Plot the convergence of the sampling to the probability of heads.

plot(cumsum(flips)/1:n, type = "l", xlab = "n", ylab = "P(X)")
abline(h = .75, col = "red")

Plot the estimated probability mass function.

barplot(table(flips)/n, xlab = "X", ylab = "P(X)", col = "lightblue")

Binomial Distribution

Now use the binomial distribution to simulate flipping a biased fair coin 1000 times.

Probability Mass Function (PMF)

Simulation

n <- 1000
p <- .75

flips <- rbinom(n, size = 1, prob = p)
head(flips)
[1] 0 0 1 0 1 1

Estimate the probability of heads.

mean(flips == 1)
[1] 0.774

Plot the convergence of the sampling to the probability of heads.

plot(cumsum(flips)/1:n, type = "l", xlab = "n", ylab = "P(X)")
abline(h = .75, col = "red")

Plot the estimated probability mass function.

barplot(table(flips)/n, xlab = "X", ylab = "P(X)", col = "lightblue")

True distribution from the binomial distribution.

x <- 0:1
y <- dbinom(x, size = 1, prob = p)

barplot(y, names.arg = x, xlab = "X", ylab = "P(X)", col = "lightblue")

Now use the binomial distribution to simulate flipping a biased fair coin 1000 times. Use the random variable \(X\) to represent the number of heads in 1000 flips.

n <- 1000
p <- .75

flips <- rbinom(1, size = 1000, prob = p)
flips
[1] 749

Now use the binomial distribution to simulate flipping a biased fair coin 1000 times and repeat 10000 times.

n <- 1000
p <- .75
B <- 10000

flips <- rbinom(B, size = n, prob = p)
head(flips)
[1] 785 748 728 755 733 761

Estimate the probability of 750 or more heads.

mean(flips <= 750)
[1] 0.5163

Estimate the probability mass function.

hist(flips, breaks = 101, col = "lightblue")

True distribution from the binomial distribution. Nice and smooth!

x <- 700:800
y <- dbinom(x, size = 1000, prob = p)

barplot(y, names.arg = x, xlab = "X", ylab = "P(X)", col = "lightblue")

Estimated probability of 750 or more heads using the simulated values.

mean(flips <= 750)
[1] 0.5163

Cumulative Distribution Function (CDF)

Compute the exact probability of 750 or more heads using the Bionomial model.

pbinom(750, size = 1000, prob = p)
[1] 0.5121376

Poisson Distribution

X is the number of accidents in a day. The average number of accidents is 3.9.

Probability Mass Function (PMF)

Simulation

Simulate the number of accidents in a day 1000 times.

n <- 1000
lambda <- 3.9

x <- rpois(n, lambda)
head(x)
[1] 3 6 2 2 4 6

Estimate the probability of .

mean(x <= 3)
[1] 0.429

Plot the convergence to the expected value or mean number of accidents per day.

plot(cumsum(x)/1:n, type = "l", xlab = "n", ylab = "P(X)")
abline(h = ppois(3, lambda), col = "red")

Plot the estimated probability mass function.

barplot(table(x), col = "lightblue")

True distribution from the Poisson distribution. Nice and smooth!

x <- 0:10
y <- dpois(x, lambda)

barplot(y, names.arg = x, xlab = "X", ylab = "P(X)", col = "lightblue")

Cumulative Distribution Function (CDF)

Compute the exact probability of 3 or fewer accidents using the Poisson model.

ppois(3, lambda)
[1] 0.4532468

Note that the sum of the probabilities of 0, 1, 2, and 3 accidents is the same as the probability of 3 or fewer accidents.

sum(dpois(0:3, lambda))
[1] 0.4532468

Poisson Approximation to the Binomial

The Poisson distribution is often used to approximate the binomial distribution when the number of trials is large and the probability of success is small. The Poisson distribution is a limit of the binomial distribution as the number of trials goes to infinity and the probability of success goes to zero. The Poisson distribution has a single parameter, the mean, which is the product of the number of trials and the probability of success.

The number of trials is 1000 and the probability of success is 0.0039. The mean is 3.9.

n <- 1000
p <- .0039
lambda <- n * p
lambda
[1] 3.9

Simulate the number of accidents in a day 1000 times using the Poisson distribution.

x <- rpois(1000, lambda)
head(x)
[1] 7 4 2 3 7 2

Estimate the probability of 3 or fewer accidents.

mean(x <= 3)
[1] 0.475

Binomial approximation to the Poisson probability.

ppois(3, lambda)
[1] 0.4532468
pbinom(3, size = 1000, prob = .0039)
[1] 0.4528948

Hypergeometric Distribution

Probability Mass Function (PMF)

Simulation

Old CA Lotto, 6 numbers from 1 to 51.

  • m is the number of successes in the population.
  • n is the number of failures in the population.
  • k is the number of draws.

The number of successes in the sample is the random variable \(X\).

B <- 1000000

x <- rhyper(B, m = 6, n = 45, k = 6)
head(x)
[1] 1 0 1 0 0 0

Estimate the probability of 5 successes.

mean(x == 0)
[1] 0.452052
mean(x == 1)
[1] 0.407171
mean(x == 2)
[1] 0.12423
mean(x == 3)
[1] 0.015697
mean(x == 4)
[1] 0.000839
mean(x == 5)
[1] 1.1e-05
mean(x == 6)
[1] 0

Plot the estimated probability mass function.

dhyper(0, m = 6, n = 45, k = 6)
[1] 0.4522656
dhyper(1, m = 6, n = 45, k = 6)
[1] 0.4070391
dhyper(2, m = 6, n = 45, k = 6)
[1] 0.1240973
dhyper(3, m = 6, n = 45, k = 6)
[1] 0.01575838
dhyper(4, m = 6, n = 45, k = 6)
[1] 0.0008245666
dhyper(5, m = 6, n = 45, k = 6)
[1] 1.499212e-05
dhyper(6, m = 6, n = 45, k = 6)
[1] 5.552637e-08

Direct Calculation

choose(6, 0) * choose(45, 6) / choose(51, 6)
[1] 0.4522656
choose(6, 1) * choose(45, 5) / choose(51, 6)
[1] 0.4070391
choose(6, 2) * choose(45, 4) / choose(51, 6)
[1] 0.1240973
choose(6, 3) * choose(45, 3) / choose(51, 6)
[1] 0.01575838
choose(6, 4) * choose(45, 2) / choose(51, 6)
[1] 0.0008245666
choose(6, 5) * choose(45, 1) / choose(51, 6)
[1] 1.499212e-05
choose(6, 6) * choose(45, 0) / choose(51, 6)
[1] 5.552637e-08

To win the jackpot, you need to match all 6 numbers. So 1 in 18,009,460.

choose(51, 6)
[1] 18009460

Geometric

Probability Mass Function (PMF)

Simulate number of free throws until the first miss. Curry is 90% from the line. So the probability of missing a free throw is 0.1.

n <- 1000
q <- 0.1

x <- rgeom(n, q)
head(x)
[1] 10 13  5  7  9  0
mean(x)
[1] 9.502

Estimate the probability of missing a free throw before 10 in a row.

mean(x <= 10)
[1] 0.664
barplot(table(x), col = "lightblue")

plot(cumsum(x)/1:n, type = "l", xlab = "n", ylab = "P(X)")

True distribution from the Geometric distribution. Nice and smooth!

x <- 0:55
y <- dgeom(x, q)

barplot(y, names.arg = x, xlab = "X", ylab = "P(X)", col = "lightblue")

Cumulative Distribution Function (CDF)

Compute the exact probability of missing a free throw before 10 in a row using the Geometric model.

pgeom(10, q)
[1] 0.6861894