Lab 3 - Coffee ratings

STA 210 - Summer 2022

Welcome

Model diagnostics

The data

The data + an outlier

The data + influential point

Influential point

An observation is influential if removing it substantially changes the coefficients of the regression model.

Influential points

Influential points have a large impact on the coefficients and standard errors used for inference
These points can sometimes be identified in a scatterplot if there is only one predictor variable, this is often not the case when there are multiple predictors
We will use measures to quantify an individual observation’s influence on the regression model: leverage, standardized residuals, and Cook’s distance

Remember `augment()`?

mtcars_fit <- linear_reg() %>%
  set_engine("lm") %>%
  fit(mpg ~ disp, data = mtcars)

augment(mtcars_fit$fit)

# A tibble: 32 × 9
   .rownames           mpg  disp .fitted .resid   .hat .sigma .cooksd .std.resid
   <chr>             <dbl> <dbl>   <dbl>  <dbl>  <dbl>  <dbl>   <dbl>      <dbl>
 1 Mazda RX4          21    160     23.0 -2.01  0.0418   3.29 8.65e-3     -0.630
 2 Mazda RX4 Wag      21    160     23.0 -2.01  0.0418   3.29 8.65e-3     -0.630
 3 Datsun 710         22.8  108     25.1 -2.35  0.0629   3.28 1.87e-2     -0.746
 4 Hornet 4 Drive     21.4  258     19.0  2.43  0.0328   3.27 9.83e-3      0.761
 5 Hornet Sportabout  18.7  360     14.8  3.94  0.0663   3.22 5.58e-2      1.25 
 6 Valiant            18.1  225     20.3 -2.23  0.0313   3.28 7.82e-3     -0.696
 7 Duster 360         14.3  360     14.8 -0.462 0.0663   3.31 7.70e-4     -0.147
 8 Merc 240D          24.4  147.    23.6  0.846 0.0461   3.30 1.72e-3      0.267
 9 Merc 230           22.8  141.    23.8 -0.997 0.0482   3.30 2.50e-3     -0.314
10 Merc 280           19.2  168.    22.7 -3.49  0.0396   3.24 2.48e-2     -1.10 
# … with 22 more rows

Model diagnostics

Use the augment() function to output statistics that can be used to diagnose the model, along with the predicted values and residuals:

outcome and predictor variables in the model
.fitted: predicted values
.se.fit: standard errors of predicted values
.resid: residuals
.hat: leverage
.sigma: estimate of residual standard deviation when the corresponding observation is dropped from model
.cooksd: Cook’s distance
.std.resid: standardized residuals

Welcome

Model diagnostics

The data

The data + an outlier

The data + influential point

Influential point

Influential points

Remember augment()?

Model diagnostics

Remember `augment()`?