likelihood_ratio

Author

Prof. Eric A. Suess

Examples of using the generalized likelihood ratio test.

Using the mtcars dataset, we will compare the full model to the reduced model. The full model includes four of the variables in the dataset. The reduced model includes only the variables disp and carb. The null model includes only the intercept. We will use the lrtest() function from the lmtest package to perform the likelihood ratio test.

Likelihood Ratio Test for Linear Regression

library(lmtest)

# fit full model
model_full <- lm(mpg ~ disp + carb + hp + cyl, data = mtcars)
model_full


Call:
lm(formula = mpg ~ disp + carb + hp + cyl, data = mtcars)

Coefficients:
(Intercept)         disp         carb           hp          cyl  
  34.021595    -0.026906    -0.926863     0.009349    -1.048523

# fit reduced model
model_reduced <- lm(mpg ~ disp + carb, data = mtcars)
model_reduced


Call:
lm(formula = mpg ~ disp + carb, data = mtcars)

Coefficients:
(Intercept)         disp         carb  
    31.1527      -0.0363      -0.9557

# fit null model
model_null <- lm(mpg ~ 1, data = mtcars)
model_null


Call:
lm(formula = mpg ~ 1, data = mtcars)

Coefficients:
(Intercept)  
      20.09

# perform likelihood ratio test for differences in models

lrtest(model_reduced, model_full)

Likelihood ratio test

Model 1: mpg ~ disp + carb
Model 2: mpg ~ disp + carb + hp + cyl
  #Df  LogLik Df  Chisq Pr(>Chisq)
1   4 -78.603                     
2   6 -77.558  2 2.0902     0.3517

lrtest(model_null, model_full)

Likelihood ratio test

Model 1: mpg ~ 1
Model 2: mpg ~ disp + carb + hp + cyl
  #Df   LogLik Df Chisq Pr(>Chisq)    
1   2 -102.378                        
2   6  -77.558  4 49.64  4.294e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

lrtest(model_null, model_reduced)

Likelihood ratio test

Model 1: mpg ~ 1
Model 2: mpg ~ disp + carb
  #Df   LogLik Df Chisq Pr(>Chisq)    
1   2 -102.378                        
2   4  -78.603  2 47.55  4.729e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Check the calculations. Compare the reduced and full models.

-2*log( exp(-78.60301) / exp(-77.55790) )

[1] 2.09022

-2*( - 78.60301 - (- 77.55790))

[1] 2.09022

2*(  78.60301 + (- 77.55790))

[1] 2.09022

pchisq(2.04522, df = 2, lower.tail = FALSE)

[1] 0.359655

Change the Reduced Model to include the other variables in the model.

For the Likelihood Ratio Test for Linear Regression, change the Reduced model to mpg ~ hp + cyl. Rerun the code and interpret the results. What is the best model?

Likelihood ratio test for logistic regression

Using the infert dataset, we will compare the full model to the reduced model. The full model includes two of the variables in the dataset. The reduced model includes only the variable induced. The null model includes only the intercept. We will use the lrtest() function from the lmtest package to perform the likelihood ratio test.

head(infert)

  education age parity induced case spontaneous stratum pooled.stratum
1    0-5yrs  26      6       1    1           2       1              3
2    0-5yrs  42      1       1    1           0       2              1
3    0-5yrs  39      6       2    1           0       3              4
4    0-5yrs  34      4       2    1           0       4              2
5   6-11yrs  35      3       1    1           1       5             32
6   6-11yrs  36      4       2    1           1       6             36

model_full <- glm(case ~ induced + spontaneous, family=binomial, data=infert)
model_full


Call:  glm(formula = case ~ induced + spontaneous, family = binomial, 
    data = infert)

Coefficients:
(Intercept)      induced  spontaneous  
    -1.7079       0.4181       1.1972  

Degrees of Freedom: 247 Total (i.e. Null);  245 Residual
Null Deviance:      316.2 
Residual Deviance: 279.6    AIC: 285.6

model_reduced <- glm(case ~ induced, family=binomial, data=infert)
model_reduced


Call:  glm(formula = case ~ induced, family = binomial, data = infert)

Coefficients:
(Intercept)      induced  
   -0.71535      0.04897  

Degrees of Freedom: 247 Total (i.e. Null);  246 Residual
Null Deviance:      316.2 
Residual Deviance: 316.1    AIC: 320.1

model_null <- glm(case ~ 1, family=binomial, data=infert)
model_null


Call:  glm(formula = case ~ 1, family = binomial, data = infert)

Coefficients:
(Intercept)  
    -0.6871  

Degrees of Freedom: 247 Total (i.e. Null);  247 Residual
Null Deviance:      316.2 
Residual Deviance: 316.2    AIC: 318.2

lrtest(model_reduced, model_full)

Likelihood ratio test

Model 1: case ~ induced
Model 2: case ~ induced + spontaneous
  #Df  LogLik Df  Chisq Pr(>Chisq)    
1   2 -158.05                         
2   3 -139.81  1 36.487  1.537e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

lrtest(model_null, model_full)

Likelihood ratio test

Model 1: case ~ 1
Model 2: case ~ induced + spontaneous
  #Df  LogLik Df  Chisq Pr(>Chisq)    
1   1 -158.09                         
2   3 -139.81  2 36.559  1.152e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

lrtest(model_null, model_reduced)

Likelihood ratio test

Model 1: case ~ 1
Model 2: case ~ induced
  #Df  LogLik Df  Chisq Pr(>Chisq)
1   1 -158.09                     
2   2 -158.05  1 0.0724     0.7879

Change the Reduced Model to include the other variables in the model.

For the Likelihood ratio test for logistic regression, change the Reduced model to case ~ spontaneous. Rerun the code and interpret the results. What is the best model?