--- title: "continuous_models" author: "Prof. Eric A. Suess" date: "3/20/2024" format: html: self-contained: true --- # Continuous Models ## Uniform Distribution Roll a fair die, the probability of each face is 1/6. The probability distribution is discrete uniform. ```{r} x <- 1:6 p <- rep(1/6, 6) y <- sample(x, 1000, replace = TRUE, prob = p) head(y) ``` ```{r} barplot(table(y)) ``` Roll a fair die, the probability of each face is 1/6. Simulate probability distribution using the continuous uniform and truncating the simulated values ```{r} a <- 0 b <- 6 x <- runif(1000, a, b) y <- trunc(x) + 1 head(y) ``` ```{r} barplot(table(y)) ``` ```{r} mean(y) sd(y) ``` True mean and standard deviation of the continuous uniform distribution ```{r} (a+b)/2 sqrt((a+b+1)^2/12) ``` Simulate uniform values between 0 and 1 for x and y. ```{r} x <- runif(1000, -1, 1) y <- runif(1000, -1, 1) plot(x, y, xlim=c(-1,1), ylim=c(-1,1), asp=1) library(plotrix) draw.circle(x=0, y=0, radius=1) ``` Estimate $\pi$ using the uniform distribution. Simulate 1000 values for x and y between -1 and 1. A nice post about [Estimating the value of Pi using Monte Carlo ](https://www.geeksforgeeks.org/estimating-value-pi-using-monte-carlo/) $$ \frac{\textrm{area of the circle}}{\textrm{area of the square}} = \frac{\pi r^{2}}{4r^{2}} = \frac{\pi}{4} $$ Use $$ \frac{\pi}{4} = \frac{\textrm{no. of points generated inside the circle}}{\textrm{total no. of points generated or no. of points generated inside the square}} $$ So $$ \pi = 4 \ast \frac{\textrm{no. of points generated inside the circle}}{\textrm{no. of points generated inside the square}} $$ What proportion of the values fall in the unit circle? ```{r} 4*mean(x^2 + y^2 < 1) ``` ## Exponential Distribution Simulate the lifetime of a computer chip, the probability of failure is constant over time. The probability distribution is exponential. ```{r} theta <- 50 x <- rexp(1000, rate = 1/theta) # rate = lambda = 1/theta h <- hist(x, breaks = 30) clr <- ifelse(h$breaks <= 100, "black", "grey")[-length(h$breaks)] plot(h, col = clr, border = "black") ``` Estimate the probability of failure in the first 100 hours. ```{r} mean(x <= 100) ``` Plot the true density function of the exponential distribution. ```{r} x <- seq(0, 200, length = 1000) y <- dexp(x, rate = 1/theta) plot(x, y, type = "l", lwd = 2) ``` Compute the probability of failure in the first 100 hours. ```{r} pexp(100, rate = 1/theta) ``` Integrate the density function to find the probability of failure in the first 100 hours. ```{r} integrate(dexp, 0, 100, rate = 1/theta) ``` ## Normal Distribution ### Standard Normal Distribution Simulate 1000 values from the standard normal distribution. ```{r} z <- rnorm(1000) hist(z, breaks = 30) ``` Compute the mean and standard deviation of the simulated values. ```{r} mean(z) sd(z) ``` Plot the histogram with and shade the area between -1 and 1. ```{r} hist(z, breaks = 30) abline(v = -1, col = "red") abline(v = 1, col = "red") ``` ```{r} h <- hist(z, breaks = 30) clr <- ifelse(h$breaks <= 1 & h$breaks >= -1, "black", "grey")[-length(h$breaks)] plot(h, col = clr, border = "black") ``` Estimate the probability that z is between -1 and 1. ```{r} mean(z > -1 & z < 1) ``` Plot the true density function of the standard normal distribution. ```{r} x <- seq(-4, 4, length = 1000) y <- dnorm(x) plot(x, y, type = "l", lwd = 2) ``` Plot the density with and shade the area between -1 and 1. ```{r} plot(x, y, type = "l", lwd = 2) x2 <- c(-1, x[x > -1 & x < 1], 1) y2 <- c(0, y[x > -1 & x < 1], 0) polygon(x2, y2, col = "grey") abline(v = -1, col = "red") abline(v = 1, col = "red") ``` Compute the probability that z is between -1 and 1. ```{r} pnorm(1) - pnorm(-1) ``` Integrate the density function to find the probability that z is between -1 and 1. ```{r} integrate(dnorm, -1, 1) ``` ### Normal Distribution Suppose the height of CSUEB women have mean of 65 inches and standard deviation of 3 inches. Simulate 1000 values from the normal distribution. ```{r} mu <- 65 sigma <- 3 x <- rnorm(1000, mean = mu, sd = sigma) hist(x, breaks = 30) ``` Estimate the mean and standard deviation of the simulated values. ```{r} mean(x) sd(x) ``` Estimate the probability that a randomly selected CSUEB woman has height between 63 and 67 inches. ```{r} mean(x > 63 & x < 67) ``` Plot the true density function of the normal distribution. Shade the area between 63 and 67 inches. ```{r} x <- seq(50, 80, length = 1000) y <- dnorm(x, mean = mu, sd = sigma) plot(x, y, type = "l", lwd = 2) x2 <- c(63, x[x > 63 & x < 67], 67) y2 <- c(0, y[x > 63 & x < 67], 0) polygon(x2, y2, col = "grey") abline(v = 63, col = "red") abline(v = 67, col = "red") ``` Compute the probability that a randomly selected CSUEB woman is between 63 and 67 inches. ```{r} pnorm(67, mean = mu, sd = sigma) - pnorm(63, mean = mu, sd = sigma) ``` ## Central Limit Theorem (CLT) The Central Limit Theorem states that the sampling distribution of the sample mean is approximately normally distributed, regardless of the distribution of the population from which the samples are drawn, if the sample size is large enough. Suppose the lifetime of a computer chip is exponentially distributed with mean 50 hours. Simulate 1000 sample means of 100 computer chips. ```{r} theta <- 50 n <- 100 B <- 1000 x <- rexp(B*n, rate = 1/theta) x <- matrix(x, ncol = n) xbar <- colMeans(x) hist(xbar, breaks = 30) ``` Estimate the mean and standard deviation of the sample means. ```{r} mean(xbar) sd(xbar) ``` Estinate the probability density function using a histogram. Shade the area between 48 and 52 hours. ```{r} h <- hist(xbar, breaks = 30) clr <- ifelse(h$breaks <= 52 & h$breaks >= 48, "black", "grey")[-length(h$breaks)] plot(h, col = clr, border = "black") ``` Estimate the probability that the sample mean is between 48 and 52 hours. ```{r} mean(xbar > 48 & xbar < 52) ``` Plot the true density function of the sample means. Shade the area between 48 and 52 hours. ```{r} x <- seq(48, 56, length = 1000) y <- dnorm(x, mean = theta, sd = sqrt(theta/n)) plot(x, y, type = "l", lwd = 2) x2 <- c(48, x[x > 48 & x < 52], 52) y2 <- c(0, y[x > 48 & x < 52], 0) polygon(x2, y2, col = "grey") abline(v = 48, col = "red") abline(v = 52, col = "red") ``` Compute the probability that the sample mean is between 48 and 52 hours. ```{r} pnorm(52, mean = theta, sd = sqrt(theta/n)) - pnorm(48, mean = theta, sd = sqrt(theta/n)) ```