library(tidyverse)
library(tidymodels)
library(openintro)
library(knitr)
AE 3: Duke Forest houses
Checking model conditions
Packages
Predict sale price from area
<- linear_reg() %>%
df_fit set_engine("lm") %>%
fit(price ~ area, data = duke_forest)
tidy(df_fit) %>%
kable(digits = 2)
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 116652.33 | 53302.46 | 2.19 | 0.03 |
area | 159.48 | 18.17 | 8.78 | 0.00 |
Model conditions
Exercise 1
The following code produces the residuals vs. fitted values plot for this model. Comment out the layer that defines the y-axis limits and re-create the plot. How does the plot change? Why might we want to define the limits explicitly?
<- augment(df_fit$fit)
df_aug
ggplot(df_aug, aes(x = .fitted, y = .resid)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed") +
ylim(-1000000, 1000000) +
labs(
x = "Fitted value", y = "Residual",
title = "Residuals vs. fitted values"
)
Exercise 2
Improve how the values on the axes of the plot are displayed by modifying the code below.
Hint: use scale_continuous()
ggplot(df_aug, aes(x = .fitted, y = .resid)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed") +
ylim(-1000000, 1000000) +
labs(
x = "Fitted value", y = "Residual",
title = "Residuals vs. fitted values"
)