Lab 3 - Coffee ratings
STA 210 - Summer 2022
Welcome
Model diagnostics
The data
The data + an outlier
The data + influential point
Influential point
An observation is influential if removing it substantially changes the coefficients of the regression model.
Influential points
Influential points have a large impact on the coefficients and standard errors used for inference
These points can sometimes be identified in a scatterplot if there is only one predictor variable, this is often not the case when there are multiple predictors
We will use measures to quantify an individual observation’s influence on the regression model: leverage, standardized residuals, and Cook’s distance
Remember augment()
?
<- linear_reg() %>%
mtcars_fit set_engine("lm") %>%
fit(mpg ~ disp, data = mtcars)
augment(mtcars_fit$fit)
# A tibble: 32 × 9
.rownames mpg disp .fitted .resid .hat .sigma .cooksd .std.resid
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Mazda RX4 21 160 23.0 -2.01 0.0418 3.29 8.65e-3 -0.630
2 Mazda RX4 Wag 21 160 23.0 -2.01 0.0418 3.29 8.65e-3 -0.630
3 Datsun 710 22.8 108 25.1 -2.35 0.0629 3.28 1.87e-2 -0.746
4 Hornet 4 Drive 21.4 258 19.0 2.43 0.0328 3.27 9.83e-3 0.761
5 Hornet Sportabout 18.7 360 14.8 3.94 0.0663 3.22 5.58e-2 1.25
6 Valiant 18.1 225 20.3 -2.23 0.0313 3.28 7.82e-3 -0.696
7 Duster 360 14.3 360 14.8 -0.462 0.0663 3.31 7.70e-4 -0.147
8 Merc 240D 24.4 147. 23.6 0.846 0.0461 3.30 1.72e-3 0.267
9 Merc 230 22.8 141. 23.8 -0.997 0.0482 3.30 2.50e-3 -0.314
10 Merc 280 19.2 168. 22.7 -3.49 0.0396 3.24 2.48e-2 -1.10
# … with 22 more rows
Model diagnostics
Use the augment()
function to output statistics that can be used to diagnose the model, along with the predicted values and residuals:
- outcome and predictor variables in the model
.fitted
: predicted values.se.fit
: standard errors of predicted values.resid
: residuals.hat
: leverage.sigma
: estimate of residual standard deviation when the corresponding observation is dropped from model.cooksd
: Cook’s distance.std.resid
: standardized residuals