AE 3: Duke Forest houses

Checking model conditions

Important

Go to the course GitHub organization and locate the repo titled ae-3-YOUR_GITHUB_USERNAME to get started.

Packages

library(tidyverse)
library(tidymodels)
library(openintro)
library(knitr)

Predict sale price from area

df_fit <- linear_reg() %>%
  set_engine("lm") %>%
  fit(price ~ area, data = duke_forest)

tidy(df_fit) %>%
  kable(digits = 2)
term estimate std.error statistic p.value
(Intercept) 116652.33 53302.46 2.19 0.03
area 159.48 18.17 8.78 0.00

Model conditions

Exercise 1

The following code produces the residuals vs. fitted values plot for this model. Comment out the layer that defines the y-axis limits and re-create the plot. How does the plot change? Why might we want to define the limits explicitly?

df_aug <- augment(df_fit$fit)

ggplot(df_aug, aes(x = .fitted, y = .resid)) +
  geom_point() +
  geom_hline(yintercept = 0, linetype = "dashed") +
  ylim(-1000000, 1000000) +
  labs(
    x = "Fitted value", y = "Residual",
    title = "Residuals vs. fitted values"
  )

Exercise 2

Improve how the values on the axes of the plot are displayed by modifying the code below.

Hint: use scale_continuous()

ggplot(df_aug, aes(x = .fitted, y = .resid)) +
  geom_point() +
  geom_hline(yintercept = 0, linetype = "dashed") +
  ylim(-1000000, 1000000) +
  labs(
    x = "Fitted value", y = "Residual",
    title = "Residuals vs. fitted values"
  )