<- c(1, 4, 6, 10, 8, 5)
time <- c(3, 6, 7, 9, 6, 5)
score plot(time, score, xlab = "Time Studying", ylab = "Test Score", main = "Scatterplot of Test Scores vs. Time Studying")
Stat. 316: Linear Regression
Relationships between two quantitative variables
Here we have measurements on two quantitative variables for the same group of individuals.
Response variable: A variable that measure the outcome of a study.
Explanatory variable: A variable that explains changes in the response variable.
Scatterplot: A plot that shows the relationship between two quantitative variables measured on the same individuals.
Scatterplot
For 6 students we have time spent studying and test score. Is there a relationship?
time studying (x) | test score (y) |
---|---|
1 | 3 |
4 | 6 |
6 | 7 |
10 | 9 |
8 | 6 |
5 | 5 |
Make a scatterplot of the data. Do the data look linear?
Correlation
The correlation coefficient, \(r\), measures the strength and direction of a linear relationship between two quantitative variables.
*Remark:** It does not distinguish between explanatory and response variables. It is not affected by changes in the unit of measurement of either or both variables.
\[ r = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum (x_i - \bar{x})^2 \sum (y_i - \bar{y})^2}} \]
where \(\bar{x}\) and \(\bar{y}\) are the sample means of \(x\) and \(y\).
Correlation
For the test score and time studying data, the correlation is
cor(time, score)
[1] 0.8914004
Least Squares Regression
The least squares regression line is the line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible. In other words, it is the “best fitting line.”
The equation of the least squares regression line is
\[ \hat{y} = b_0 + b_1 x \]
where \(b_0\) is the y-intercept and \(b_1\) is the slope.
Least Squares Regression
The slope of the least squares regression line is
\[ b_1 = r \frac{s_y}{s_x} \]
where \(s_x\) and \(s_y\) are the sample standard deviations of \(x\) and \(y\).
The y-intercept of the least squares regression line is
\[ b_0 = \bar{y} - b_1 \bar{x} \]
Least Squares Regression
For the test score and time studying data, the least squares regression line is
lm(score ~ time)
Call:
lm(formula = score ~ time)
Coefficients:
(Intercept) time
2.7838 0.5676
Least Squares Regression
The least squares regression line is
\[ \hat{y} = 2.5 + 0.6 x \]
Least Squares Regression
Plot a line on the scatterplot.
plot(time, score, xlab = "Time Studying", ylab = "Test Score", main = "Scatterplot of Test Scores vs. Time Studying")
abline(lm(score ~ time), col = "red")