continuous_models

Author

Prof. Eric A. Suess

Published

March 20, 2024

Continuous Models

Uniform Distribution

Roll a fair die, the probability of each face is 1/6. The probability distribution is discrete uniform.

x <- 1:6
p <- rep(1/6, 6)

y <- sample(x, 1000, replace = TRUE, prob = p)
head(y)
[1] 6 1 5 4 4 1
barplot(table(y))

Roll a fair die, the probability of each face is 1/6. Simulate probability distribution using the continuous uniform and truncating the simulated values

a <- 0
b <- 6

x <- runif(1000, a, b)
y <- trunc(x) + 1
head(y)
[1] 4 3 5 3 2 5
barplot(table(y))

mean(y)
[1] 3.467
sd(y)
[1] 1.723334

True mean and standard deviation of the continuous uniform distribution

(a+b)/2
[1] 3
sqrt((a+b+1)^2/12)
[1] 2.020726

Simulate uniform values between 0 and 1 for x and y.

x <- runif(1000, -1, 1)
y <- runif(1000, -1, 1)

plot(x, y, xlim=c(-1,1), ylim=c(-1,1), asp=1)

library(plotrix)

draw.circle(x=0, y=0, radius=1)

Estimate \(\pi\) using the uniform distribution. Simulate 1000 values for x and y between -1 and 1.

A nice post about Estimating the value of Pi using Monte Carlo

\[ \frac{\textrm{area of the circle}}{\textrm{area of the square}} = \frac{\pi r^{2}}{4r^{2}} = \frac{\pi}{4} \] Use

\[ \frac{\pi}{4} = \frac{\textrm{no. of points generated inside the circle}}{\textrm{total no. of points generated or no. of points generated inside the square}} \]

So

\[ \pi = 4 \ast \frac{\textrm{no. of points generated inside the circle}}{\textrm{no. of points generated inside the square}} \]

What proportion of the values fall in the unit circle?

4*mean(x^2 + y^2 < 1)
[1] 3.132

Exponential Distribution

Simulate the lifetime of a computer chip, the probability of failure is constant over time. The probability distribution is exponential.

theta <- 50

x <- rexp(1000, rate = 1/theta)  # rate = lambda = 1/theta

h <- hist(x, breaks = 30)

clr <- ifelse(h$breaks <= 100, "black", "grey")[-length(h$breaks)]

plot(h, col = clr, border = "black")

Estimate the probability of failure in the first 100 hours.

mean(x <= 100)
[1] 0.852

Plot the true density function of the exponential distribution.

x <- seq(0, 200, length = 1000)

y <- dexp(x, rate = 1/theta)

plot(x, y, type = "l", lwd = 2)

Compute the probability of failure in the first 100 hours.

pexp(100, rate = 1/theta)
[1] 0.8646647

Integrate the density function to find the probability of failure in the first 100 hours.

integrate(dexp, 0, 100, rate = 1/theta)
0.8646647 with absolute error < 9.6e-15

Normal Distribution

Standard Normal Distribution

Simulate 1000 values from the standard normal distribution.

z <- rnorm(1000)

hist(z, breaks = 30)

Compute the mean and standard deviation of the simulated values.

mean(z)
[1] 0.0236639
sd(z)
[1] 1.013391

Plot the histogram with and shade the area between -1 and 1.

hist(z, breaks = 30)

abline(v = -1, col = "red")

abline(v = 1, col = "red")

h <- hist(z, breaks = 30)

clr <- ifelse(h$breaks <= 1 & h$breaks >= -1, "black", "grey")[-length(h$breaks)]

plot(h, col = clr, border = "black")

Estimate the probability that z is between -1 and 1.

mean(z > -1 & z < 1)
[1] 0.674

Plot the true density function of the standard normal distribution.

x <- seq(-4, 4, length = 1000)

y <- dnorm(x)

plot(x, y, type = "l", lwd = 2)

Plot the density with and shade the area between -1 and 1.

plot(x, y, type = "l", lwd = 2)
x2 <- c(-1, x[x > -1 & x < 1], 1)
y2 <- c(0, y[x > -1 & x < 1], 0)
polygon(x2, y2, col = "grey")
abline(v = -1, col = "red")
abline(v = 1, col = "red")

Compute the probability that z is between -1 and 1.

pnorm(1) - pnorm(-1)
[1] 0.6826895

Integrate the density function to find the probability that z is between -1 and 1.

integrate(dnorm, -1, 1)
0.6826895 with absolute error < 7.6e-15

Normal Distribution

Suppose the height of CSUEB women have mean of 65 inches and standard deviation of 3 inches. Simulate 1000 values from the normal distribution.

mu <- 65
sigma <- 3

x <- rnorm(1000, mean = mu, sd = sigma)

hist(x, breaks = 30)

Estimate the mean and standard deviation of the simulated values.

mean(x)
[1] 64.99791
sd(x)
[1] 3.134057

Estimate the probability that a randomly selected CSUEB woman has height between 63 and 67 inches.

mean(x > 63 & x < 67)
[1] 0.467

Plot the true density function of the normal distribution. Shade the area between 63 and 67 inches.

x <- seq(50, 80, length = 1000)

y <- dnorm(x, mean = mu, sd = sigma)

plot(x, y, type = "l", lwd = 2)
x2 <- c(63, x[x > 63 & x < 67], 67)
y2 <- c(0, y[x > 63 & x < 67], 0)
polygon(x2, y2, col = "grey")
abline(v = 63, col = "red")
abline(v = 67, col = "red")

Compute the probability that a randomly selected CSUEB woman is between 63 and 67 inches.

pnorm(67, mean = mu, sd = sigma) - pnorm(63, mean = mu, sd = sigma)
[1] 0.4950149

Central Limit Theorem (CLT)

The Central Limit Theorem states that the sampling distribution of the sample mean is approximately normally distributed, regardless of the distribution of the population from which the samples are drawn, if the sample size is large enough.

Suppose the lifetime of a computer chip is exponentially distributed with mean 50 hours. Simulate 1000 sample means of 100 computer chips.

theta <- 50
n <- 100

B <- 1000

x <- rexp(B*n, rate = 1/theta)

x <- matrix(x, ncol = n)

xbar <- colMeans(x)

hist(xbar, breaks = 30)

Estimate the mean and standard deviation of the sample means.

mean(xbar)
[1] 50.01451
sd(xbar)
[1] 1.670134

Estinate the probability density function using a histogram. Shade the area between 48 and 52 hours.

h <- hist(xbar, breaks = 30)

clr <- ifelse(h$breaks <= 52 & h$breaks >= 48, "black", "grey")[-length(h$breaks)]

plot(h, col = clr, border = "black")

Estimate the probability that the sample mean is between 48 and 52 hours.

mean(xbar > 48 & xbar < 52)
[1] 0.72

Plot the true density function of the sample means. Shade the area between 48 and 52 hours.

x <- seq(48, 56, length = 1000)

y <- dnorm(x, mean = theta, sd = sqrt(theta/n))

plot(x, y, type = "l", lwd = 2)

x2 <- c(48, x[x > 48 & x < 52], 52)
y2 <- c(0, y[x > 48 & x < 52], 0)
polygon(x2, y2, col = "grey")
abline(v = 48, col = "red")
abline(v = 52, col = "red")

Compute the probability that the sample mean is between 48 and 52 hours.

pnorm(52, mean = theta, sd = sqrt(theta/n)) - pnorm(48, mean = theta, sd = sqrt(theta/n))
[1] 0.9953223