Box–Muller Method and Change of Variables

Author

Reconstructed from handwritten notes

Functions of Several Variables

Suppose \((x,y)\) are jointly distributed and mapped onto \((u,v)\), jointly distributed.

\(u = g_1(x,y)\)

\(v = g_2(x,y)\)

and that the transformation can be inverted:

\(x = h_1(u,v)\)

\(y = h_2(u,v)\)

Assume \(h_1\) and \(h_2\) have continuous partial derivatives and that the Jacobian

\(J = \left| \begin{matrix} \dfrac{\partial}{\partial u} h_1 & \dfrac{\partial}{\partial v} h_1 \\ \dfrac{\partial}{\partial u} h_2 & \dfrac{\partial}{\partial v} h_2 \end{matrix} \right| \ne 0\)

Then

\(f_{UV}(u,v) = f_{XY}\big(h_1(u,v),\,h_2(u,v)\big)\,\lvert J \rvert.\)

Example: Polar Coordinates from Independent Normals

Let \(X\) and \(Y\) be independent \(\mathcal{N}(0,1)\) random variables. Find the joint density of the polar coordinates \(R\) and \(\Theta\). Recall polar coordinates:

\(r = \sqrt{x^2 + y^2}\)

\(\theta = \tan^{-1}\!\left(\dfrac{y}{x}\right)\)

and the inverse transformation

\(x = r\cos\theta\)

\(y = r\sin\theta\)

\(f_{R\Theta}(r,\theta) = f_{XY}(x,y)\,\lvert J \rvert.\)

Find \(J\):

\(J = \det\! \begin{bmatrix} \dfrac{\partial x}{\partial r} & \dfrac{\partial x}{\partial \theta} \\ \dfrac{\partial y}{\partial r} & \dfrac{\partial y}{\partial \theta} \end{bmatrix} = \det\! \begin{bmatrix} \cos\theta & -r\sin\theta \\ \sin\theta & \;r\cos\theta \end{bmatrix}\)

\(=\; r\cos^2\theta + r\sin^2\theta\)

\(=\; r\,\big(\cos^2\theta + \sin^2\theta\big)\)

\(=\; r.\)

Therefore

\(f_{R\Theta}(r,\theta) = r\; f_{XY}\big(r\cos\theta,\, r\sin\theta\big).\)

Because \(X\) and \(Y\) are independent standard normals,

\(f_X(x) = \dfrac{1}{\sqrt{2\pi}}\,e^{-x^2/2}, \quad f_Y(y) = \dfrac{1}{\sqrt{2\pi}}\,e^{-y^2/2}.\)

Hence

\(f_{R\Theta}(r,\theta) = r\, f_X\big(r\cos\theta\big)\, f_Y\big(r\sin\theta\big)\)

\(= r\; \dfrac{1}{\sqrt{2\pi}} e^{-r^2\cos^2\theta/2} \; \dfrac{1}{\sqrt{2\pi}} e^{-r^2\sin^2\theta/2}\)

\(= r\; \dfrac{1}{2\pi}\; e^{-\tfrac{1}{2} r^2\,(\cos^2\theta+\sin^2\theta)}\)

\(= r\; \dfrac{1}{2\pi}\; e^{-r^2/2}.\)

Thus \(R\) and \(\Theta\) are independent with

\(f_R(r) = r\,e^{-r^2/2},\; r\ge 0 \quad \text{(Rayleigh)}\)

\(f_{\Theta}(\theta) = \dfrac{1}{2\pi},\; \theta\in(0,2\pi) \quad \text{(Uniform)}.\)

Visualizing the Polar Transformation in R

We can illustrate the mapping of a random Euclidean point \((X,Y)\) to polar coordinates \((R,\Theta)\), with \(\Theta\) measured counter-clockwise from the positive \(X\)-axis.

Code

set.seed(123)

# Generate a single random point from N(0,1) x N(0,1)
x <- rnorm(1)
y <- rnorm(1)

# Polar conversion
r <- sqrt(x^2 + y^2)
theta <- atan2(y, x)                # in (-pi, pi]
theta_ccw <- ifelse(theta < 0, theta + 2*pi, theta)  # map to [0, 2*pi)

# Plot base
plot(0, 0, type = "n", xlim = c(-3,3), ylim = c(-3,3),
     xlab = "X", ylab = "Y", main = "Euclidean to Polar Coordinates (CCW angle)")
abline(h = 0, v = 0, lty = 3)  # dashed axes for reference

# Point and radius vector
points(x, y, col = "red", pch = 19, cex = 1.5)
arrows(0, 0, x, y, col = "blue", lwd = 2)

# Dashed circle at radius R
symbols(0, 0, circles = r, inches = FALSE, add = TRUE, lty = 2)

# Dashed line through (0,0) and (x,y) using angle (avoids slope issues when x ~ 0)
L <- 3
lines(c(-L, L) * cos(theta), c(-L, L) * sin(theta), lty = 3)

# Annotate R
text(x, y, labels = "(X,Y)", pos = 4)
text(x/2, y/2, labels = expression(R), pos = 2)

# Counter-clockwise arc from 0 to theta_ccw
arc_r <- max(0.8, min(1.2, 0.8 * r))  # keep the arc visible
arc_t <- seq(0, theta_ccw, length.out = 200)
lines(arc_r * cos(arc_t), arc_r * sin(arc_t), col = "darkgreen", lwd = 2)

# Angle label at midpoint of the arc
mid_t <- theta_ccw / 2
text(arc_r * cos(mid_t), arc_r * sin(mid_t), labels = expression(theta),
     col = "darkgreen", pos = 3)

Euclidean point and its polar representation

This plot shows:

\((X,Y)\) as the red dot in Euclidean space.
\(R\) as the radius of the dashed circle.
\(\Theta\) measured counter-clockwise from the positive \(X\)-axis, indicated by the green arc.
Dashed lines through the origin along the axes and through \((X,Y)\).

This plot shows:

\((X,Y)\) as the red dot in Euclidean space.
\(R\) as the radius of the dashed circle.
\(\Theta\) measured counter-clockwise from the positive \(X\)-axis, indicated by the green arc.
Dashed lines through the origin along the \(X\)-axis and through \((X,Y)\).

Box–Muller Method

Generate standard normal random values \((X,Y)\) from independent uniforms \((U_1,U_2)\).

Generate \(U_1, U_2 \overset{\text{iid}}\sim \operatorname{Unif}(0,1)\).
\(-2\,\log U_1 \sim \operatorname{Exp}(\tfrac{1}{2})\) and \(2\pi U_2 \sim \operatorname{Unif}(0,2\pi)\).
Set \(X = \sqrt{-2\,\log U_1}\,\cos(2\pi U_2)\) and \(Y = \sqrt{-2\,\log U_1}\,\sin(2\pi U_2)\), which yields \(X\sim\mathcal{N}(0,1)\) and \(Y\sim\mathcal{N}(0,1)\).

R Implementation

Code

set.seed(42)

# Box–Muller generator: returns n pairs (X, Y)
box_muller <- function(n) {
  u1 <- runif(n)
  u2 <- runif(n)
  r <- sqrt(-2 * log(u1))
  theta <- 2 * pi * u2
  x <- r * cos(theta)
  y <- r * sin(theta)
  data.frame(x = x, y = y)
}

# Example: draw 10,000 samples and check moments
samples <- box_muller(10000)
c(mean_x = mean(samples$x), var_x = var(samples$x),
  mean_y = mean(samples$y), var_y = var(samples$y))

      mean_x        var_x       mean_y        var_y 
0.0005101192 1.0244204655 0.0006820139 0.9999651650

Code

# Quick visual checks
par(mfrow = c(1, 3))
hist(samples$x, breaks = 40, main = "X ~ N(0,1)")
hist(samples$y, breaks = 40, main = "Y ~ N(0,1)")
plot(samples$x, samples$y, pch = 16, cex = 0.4,
     main = "Scatter of (X,Y)")

Code

par(mfrow = c(1, 1))

Notes: The derivations above follow the change-of-variables formula with the polar transformation, showing that \(R\) is Rayleigh and \(\Theta\) is Uniform, which leads directly to the Box–Muller algorithm.

Derivation of the Rayleigh Distribution

1. From Uniform to Exponential via the Inverse CDF Method

Let \(U \sim \operatorname{Unif}(0,1)\). To generate an exponential random variable \(T\) with rate \(\lambda = \tfrac{1}{2}\) (mean \(2\)), we use the Inverse CDF Method.

The CDF of \(\operatorname{Exp}(\tfrac{1}{2})\) is

\(F_T(t) = 1 - e^{-t/2}, \quad t \ge 0.\)

Set \(F_T(t) = U\) and solve for \(t\):

\(U = 1 - e^{-t/2}\)

\(e^{-t/2} = 1 - U\)

\(-\tfrac{t}{2} = \ln(1-U)\)

\(t = -2\,\ln(1-U).\)

Since \(1-U \sim \operatorname{Unif}(0,1)\) as well, we can equivalently write

\(t = -2\,\ln U.\)

Thus

\(T = -2\ln U \sim \operatorname{Exp}(1/2).\)

2. From Exponential to Rayleigh

We want \(R\) such that \(R^2 \sim \operatorname{Exp}(\tfrac{1}{2})\).
That is, if \(T \sim \operatorname{Exp}(1/2)\) then \(R = \sqrt{T}\) is Rayleigh distributed.

The CDF of \(R\) is

\(F_R(r) = P(R \le r) = P(\sqrt{T} \le r) = P(T \le r^2).\)

Because \(T \sim \operatorname{Exp}(1/2)\),

\(F_T(t) = 1 - e^{-t/2}.\)

\(F_R(r) = F_T(r^2) = 1 - e^{-r^2/2}, \quad r \ge 0.\)

Differentiating gives the Rayleigh PDF:

\(f_R(r) = \dfrac{d}{dr}F_R(r) = r e^{-r^2/2}, \quad r \ge 0.\)

3. Inverse CDF Check for Rayleigh

To generate \(R\) directly from \(U \sim \operatorname{Unif}(0,1)\):

Set \(F_R(r) = U\):

\(U = 1 - e^{-r^2/2}\)

\(e^{-r^2/2} = 1 - U\)

\(r^2 = -2\,\ln(1-U)\)

\(r = \sqrt{-2\,\ln(1-U)}.\)

Since \(1-U \sim \operatorname{Unif}(0,1)\), we can write

\(r = \sqrt{-2\,\ln U},\)

which matches the Box–Muller derivation.

R Implementation

Code

# Generate Rayleigh samples using inverse transform
rayleigh_inverse <- function(n) {
  u <- runif(n)
  r <- sqrt(-2 * log(u))
  r
}

# Check empirical vs theoretical mean
set.seed(123)
samples_r <- rayleigh_inverse(10000)
c(emp_mean = mean(samples_r), emp_var = var(samples_r),
  th_mean = sqrt(pi/2), th_var = (4 - pi)/2)

 emp_mean   emp_var   th_mean    th_var 
1.2583035 0.4248272 1.2533141 0.4292037

Code

hist(samples_r, breaks = 40, probability = TRUE,
     main = "Rayleigh(σ=1)", xlab = "r")
curve(x * exp(-x^2/2), from = 0, to = 5, add = TRUE, col = "red", lwd = 2)

Thus, the Rayleigh distribution naturally arises as the distribution of \(R = \sqrt{-2\ln U}\), completing the connection between uniforms, exponentials, and normals via the Box–Muller construction.

Deriving the Rayleigh via the Inverse CDF Method

1) From Uniform to \(\operatorname{Exp}\!(\tfrac{1}{2})\)

Let \(U \sim \operatorname{Unif}(0,1)\). The target CDF for \(T \sim \operatorname{Exp}(\tfrac{1}{2})\) is \(F_T(t) = 1 - e^{-t/2},\; t \ge 0.\)

Set \(U = F_T(T)\) and solve for \(T\): \(U = 1 - e^{-T/2}\) \(e^{-T/2} = 1 - U\) \(-\dfrac{T}{2} = \log(1-U)\) \(T = -2\,\log(1-U).\)

Since \(1-U \overset{d}{=} U\) for \(U\sim\operatorname{Unif}(0,1)\), an equivalent generator is \(T = -2\,\log U,\) which yields \(T \sim \operatorname{Exp}(\tfrac{1}{2}).\)

2) From \(\operatorname{Exp}\!(\tfrac{1}{2})\) to Rayleigh

Define the transformation \(R = \sqrt{T}.\)

For \(r \ge 0\), \(\mathbb{P}(R \le r) = \mathbb{P}(\sqrt{T} \le r) = \mathbb{P}(T \le r^2) = F_T(r^2) = 1 - e^{-r^2/2}.\)

Thus the CDF of \(R\) is \(F_R(r) = 1 - e^{-r^2/2},\; r \ge 0,\) and the PDF is \(f_R(r) = \dfrac{d}{dr} F_R(r) = r\,e^{-r^2/2},\; r \ge 0,\) which is the Rayleigh distribution with parameter \(\sigma = 1.\)

3) Verifying via the Inverse CDF for Rayleigh

Start from \(U \sim \operatorname{Unif}(0,1)\) and set \(U = F_R(R)\) with \(F_R(r) = 1 - e^{-r^2/2},\; r \ge 0.\)

Solve for \(R\): \(U = 1 - e^{-R^2/2}\) \(e^{-R^2/2} = 1 - U\) \(-\dfrac{R^2}{2} = \log(1-U)\) \(R = \sqrt{-2\,\log(1-U)}.\)

Again using \(1-U \overset{d}{=} U\), an equivalent generator is \(R = \sqrt{-2\,\log U}.\)

Combining parts (1) and (2), if \(T = -2\,\log U \sim \operatorname{Exp}(\tfrac{1}{2})\) and \(R = \sqrt{T}\), then \(F_R(r) = 1 - e^{-r^2/2}\) and \(f_R(r) = r\,e^{-r^2/2},\) confirming that the transformation produces a Rayleigh random variable.

The Bivariate Normal Distribution

We introduce the general bivariate normal distribution with different means, variances, and correlation.
Notation: \(BVN(\mu_X, \mu_Y, \sigma_X^2, \sigma_Y^2, \rho)\).

1) Adding Means and Variances

Let the random vector be \[ \mathbf{Z} = \begin{bmatrix} X \\ Y \end{bmatrix}. \]

We specify its mean vector as \[ \boldsymbol{\mu} = \begin{bmatrix} \mu_X \\ \mu_Y \end{bmatrix}, \]

and its covariance matrix without correlation as \[ \Sigma_0 = \begin{bmatrix} \sigma_X^2 & 0 \\ 0 & \sigma_Y^2 \end{bmatrix}. \]

Thus if \(X \sim N(\mu_X, \sigma_X^2)\) and \(Y \sim N(\mu_Y, \sigma_Y^2)\) independently, then \[ \mathbf{Z} \sim BVN(\mu_X, \mu_Y, \sigma_X^2, \sigma_Y^2, 0). \]

2) Adding Correlation \(\rho\)

To introduce correlation, the covariance matrix becomes \[ \Sigma = \begin{bmatrix} \sigma_X^2 & \rho\,\sigma_X\sigma_Y \\ \rho\,\sigma_X\sigma_Y & \sigma_Y^2 \end{bmatrix}. \]

Thus the full distribution is \[ \mathbf{Z} \sim BVN(\mu_X, \mu_Y, \sigma_X^2, \sigma_Y^2, \rho). \]

Its density is \[ f_{X,Y}(x,y) = \frac{1}{2\pi\,\sigma_X\sigma_Y\sqrt{1-\rho^2}} \exp\!\left( -\frac{1}{2(1-\rho^2)} \Bigg[ \frac{(x-\mu_X)^2}{\sigma_X^2} - 2\rho\frac{(x-\mu_X)(y-\mu_Y)}{\sigma_X\sigma_Y} + \frac{(y-\mu_Y)^2}{\sigma_Y^2} \Bigg] \right). \]

In compact matrix notation, with \[ \Sigma^{-1} = \begin{bmatrix} 1/\sigma_X^2 & -\rho/(\sigma_X\sigma_Y) \\ -\rho/(\sigma_X\sigma_Y) & 1/\sigma_Y^2 \end{bmatrix} \cdot \frac{1}{1-\rho^2}, \]

the density can be written as \[ f_{X,Y}(\mathbf{z}) = \frac{1}{2\pi\,|\Sigma|^{1/2}} \exp\!\left(-\tfrac{1}{2}(\mathbf{z}-\boldsymbol{\mu})^\top \Sigma^{-1}(\mathbf{z}-\boldsymbol{\mu})\right). \]

This general form compactly encodes the means, variances, and correlation structure of the bivariate normal distribution.

Bivariate Normal \(\operatorname{BVN}(\mu_X, \mu_Y, \sigma_X^2, \sigma_Y^2, \rho)\)

We derive the joint density in two steps: first with arbitrary means and variances (no correlation), then add correlation \(\rho\) using matrix notation.

1) Add means and variances (no correlation)

Let the random vector be \(\mathbf{Z} = \begin{bmatrix} X \\ Y \end{bmatrix}\) with mean \(\boldsymbol{\mu} = \begin{bmatrix} \mu_X \\ \mu_Y \end{bmatrix}\) and covariance matrix \(\boldsymbol{\Sigma}_0 = \begin{bmatrix} \sigma_X^2 & 0 \\ 0 & \sigma_Y^2 \end{bmatrix}.\)

Equivalently, \(X \sim \mathcal{N}(\mu_X,\sigma_X^2)\) and \(Y \sim \mathcal{N}(\mu_Y,\sigma_Y^2)\) independently. The joint PDF factors: \(f_{X,Y}(x,y) = \dfrac{1}{2\pi\,\sigma_X\,\sigma_Y}\exp\!\left\{-\dfrac{1}{2}\left[\dfrac{(x-\mu_X)^2}{\sigma_X^2} + \dfrac{(y-\mu_Y)^2}{\sigma_Y^2}\right]\right\}.\)

In matrix notation, with \(\mathbf{z} = \begin{bmatrix} x \\ y \end{bmatrix}\), \(f_{X,Y}(\mathbf{z}) = \dfrac{1}{(2\pi)^{1}}\,\dfrac{1}{\sqrt{\lvert \boldsymbol{\Sigma}_0 \rvert}}\,\exp\!\left\{-\dfrac{1}{2}(\mathbf{z}-\boldsymbol{\mu})^\top \boldsymbol{\Sigma}_0^{-1} (\mathbf{z}-\boldsymbol{\mu})\right\},\) where \(\lvert \boldsymbol{\Sigma}_0 \rvert = \sigma_X^2\,\sigma_Y^2\) and \(\boldsymbol{\Sigma}_0^{-1} = \begin{bmatrix} 1/\sigma_X^2 & 0 \\ 0 & 1/\sigma_Y^2 \end{bmatrix}.\)

Thus \(\mathbf{Z} \sim \operatorname{BVN}(\mu_X, \mu_Y, \sigma_X^2, \sigma_Y^2, 0).\)

2) Add correlation \(\rho\)

Introduce covariance \(\operatorname{Cov}(X,Y) = \rho\,\sigma_X\sigma_Y\) with \(\rho \in (-1,1)\). The covariance matrix becomes \(\boldsymbol{\Sigma} = \begin{bmatrix} \sigma_X^2 & \rho\,\sigma_X\sigma_Y \\ \rho\,\sigma_X\sigma_Y & \sigma_Y^2 \end{bmatrix}.\)

Its determinant and inverse are \(\lvert \boldsymbol{\Sigma} \rvert = \sigma_X^2\,\sigma_Y^2\,(1-\rho^2)\)

\(\boldsymbol{\Sigma}^{-1} = \dfrac{1}{1-\rho^2}\begin{bmatrix} 1/\sigma_X^2 & -\rho/(\sigma_X\sigma_Y) \\ -\rho/(\sigma_X\sigma_Y) & 1/\sigma_Y^2 \end{bmatrix}.\)

The bivariate normal density with correlation \(\rho\) is

\(f_{X,Y}(x,y) = \dfrac{1}{2\pi\,\sigma_X\,\sigma_Y\,\sqrt{1-\rho^2}}\,\exp\!\left\{-\dfrac{1}{2(1-\rho^2)}\left[ \dfrac{(x-\mu_X)^2}{\sigma_X^2} - 2\rho\,\dfrac{(x-\mu_X)(y-\mu_Y)}{\sigma_X\sigma_Y} + \dfrac{(y-\mu_Y)^2}{\sigma_Y^2} \right]\right\}.\)

In compact matrix form, \(f_{X,Y}(\mathbf{z}) = \dfrac{1}{(2\pi)^{1}}\,\dfrac{1}{\sqrt{\lvert \boldsymbol{\Sigma} \rvert}}\,\exp\!\left\{-\dfrac{1}{2}(\mathbf{z}-\boldsymbol{\mu})^\top \boldsymbol{\Sigma}^{-1} (\mathbf{z}-\boldsymbol{\mu})\right\}.\)

Notes: 1) When \(\rho = 0\), the off-diagonal terms vanish and the density reduces to the independent case above.
2) Existence requires \(\sigma_X>0\), \(\sigma_Y>0\), and \(\lvert\rho\rvert<1\) so that \(\boldsymbol{\Sigma}\) is positive definite.
3) One can obtain \(\mathbf{Z}\) by an affine transform of a standard normal \(\mathbf{N} \sim \mathcal{N}(\mathbf{0}, \mathbf{I}_2)\) using a matrix square root (e.g., Cholesky): \(\mathbf{Z} = \boldsymbol{\mu} + \mathbf{L}\,\mathbf{N}\) with \(\mathbf{L}\,\mathbf{L}^\top = \boldsymbol{\Sigma}.\)

Deriving the Bivariate Normal \(BVN(\mu_X, \mu_Y, \sigma_X^2, \sigma_Y^2, \rho)\) via PDF Transformations

We use matrix notation. Let \(\mathbf{z} = (z_1, z_2)^\top\) with \(z_1, z_2 \overset{\text{iid}}\sim \mathcal{N}(0,1)\) and joint PDF \(f_{\mathbf{Z}}(\mathbf{z}) = \dfrac{1}{2\pi} \exp\!\left\{-\dfrac{1}{2}\,\mathbf{z}^\top\mathbf{z}\right\}.\)

Define \(\boldsymbol{\mu} = (\mu_X, \mu_Y)^\top\) and \(\mathbf{D} = \operatorname{diag}(\sigma_X, \sigma_Y)\).

1) Add Means and Variances (Independent Case)

Consider the affine map \(\mathbf{x}_0 = \boldsymbol{\mu} + \mathbf{D}\,\mathbf{z}.\) Then \(\mathbf{z} = \mathbf{D}^{-1}(\mathbf{x}_0 - \boldsymbol{\mu})\) and the Jacobian is \(\left|\det \dfrac{\partial \mathbf{z}}{\partial \mathbf{x}_0} \right| = \left|\det \mathbf{D}^{-1}\right| = \dfrac{1}{\sigma_X\sigma_Y}.\)

By the PDF method, \(f_{\mathbf{X}_0}(\mathbf{x}_0) = f_{\mathbf{Z}}\big(\mathbf{D}^{-1}(\mathbf{x}_0 - \boldsymbol{\mu})\big)\,\left|\det \mathbf{D}^{-1}\right|.\)

Therefore \(f_{\mathbf{X}_0}(x_0,y_0) = \dfrac{1}{2\pi\,\sigma_X\sigma_Y}\, \exp\!\left\{-\dfrac{1}{2}\left[\left(\dfrac{x_0-\mu_X}{\sigma_X}\right)^2 + \left(\dfrac{y_0-\mu_Y}{\sigma_Y}\right)^2\right]\right\},\)

which is the independent bivariate normal with means \(\mu_X,\mu_Y\) and variances \(\sigma_X^2,\sigma_Y^2\).

2) Add Correlation \(\rho\) (Keep Means and Variances)

Work with centered variables \(\tilde{\mathbf{x}}_0 = \mathbf{x}_0 - \boldsymbol{\mu}\) and construct \(\tilde{\mathbf{x}} = \mathbf{C}\,\tilde{\mathbf{x}}_0,\) where \(\mathbf{C} = \begin{bmatrix} 1 & 0 \\ \rho\,\dfrac{\sigma_Y}{\sigma_X} & \sqrt{1-\rho^2} \end{bmatrix}.\)

Set the final variables as \(\mathbf{x} = \boldsymbol{\mu} + \tilde{\mathbf{x}} = \boldsymbol{\mu} + \mathbf{C}(\mathbf{x}_0 - \boldsymbol{\mu}).\)

The Jacobian of the map \(\mathbf{x}_0 \mapsto \mathbf{x}\) equals \(\left|\det \dfrac{\partial \mathbf{x}_0}{\partial \mathbf{x}}\right| = \left|\det \mathbf{C}^{-1}\right| = \dfrac{1}{\sqrt{1-\rho^2}}.\)

Using the PDF method again with translation invariance of the Jacobian, we obtain \(f_{\mathbf{X}}(\mathbf{x}) = f_{\mathbf{X}_0}\big(\mathbf{C}^{-1}(\mathbf{x}-\boldsymbol{\mu}) + \boldsymbol{\mu}\big)\,\left|\det \mathbf{C}^{-1}\right|.\)

After algebra (or by composing the two linear maps on \(\mathbf{z}\)), the covariance of \(\mathbf{x}\) is \(\boldsymbol{\Sigma} = \begin{bmatrix} \sigma_X^2 & \rho\,\sigma_X\sigma_Y \\ \rho\,\sigma_X\sigma_Y & \sigma_Y^2 \end{bmatrix},\) and the joint PDF simplifies to the standard \(BVN\) form \(f_{X,Y}(x,y) = \dfrac{1}{2\pi\,\sigma_X\sigma_Y\,\sqrt{1-\rho^2}}\,\exp\!\left\{-\dfrac{1}{2(1-\rho^2)}\left[\left(\dfrac{x-\mu_X}{\sigma_X}\right)^2 - 2\rho\,\left(\dfrac{x-\mu_X}{\sigma_X}\right)\left(\dfrac{y-\mu_Y}{\sigma_Y}\right) + \left(\dfrac{y-\mu_Y}{\sigma_Y}\right)^2\right]\right\}.\)

Matrix Notation Summary

Let \(\boldsymbol{\mu} = (\mu_X,\mu_Y)^\top\) and \(\boldsymbol{\Sigma} = \begin{bmatrix} \sigma_X^2 & \rho\,\sigma_X\sigma_Y \\ \rho\,\sigma_X\sigma_Y & \sigma_Y^2 \end{bmatrix}.\) Then the density can be written compactly as \(f_{\mathbf{X}}(\mathbf{x}) = \dfrac{1}{2\pi\sqrt{\det \boldsymbol{\Sigma}}}\,\exp\!\left\{-\dfrac{1}{2}(\mathbf{x}-\boldsymbol{\mu})^\top\boldsymbol{\Sigma}^{-1}(\mathbf{x}-\boldsymbol{\mu})\right\},\) where \(\det\boldsymbol{\Sigma} = \sigma_X^2\sigma_Y^2(1-\rho^2)\) and \(\boldsymbol{\Sigma}^{-1} = \dfrac{1}{(1-\rho^2)\,\sigma_X^2\sigma_Y^2} \begin{bmatrix} \sigma_Y^2 & -\rho\,\sigma_X\sigma_Y \\ -\rho\,\sigma_X\sigma_Y & \sigma_X^2 \end{bmatrix}.\)

The two-step construction explicitly shows how translation and diagonal scaling introduce means and variances (with Jacobian \(|\det\mathbf{D}^{-1}|\)), and how a subsequent shear/rotation \(\mathbf{C}\) introduces correlation (with Jacobian \(|\det\mathbf{C}^{-1}|\)), yielding the full \(BVN(\mu_X,\mu_Y,\sigma_X^2,\sigma_Y^2,\rho)\) density.