library(tidyverse)
library(tidymodels)
library(knitr)
library(Stat2Data)
library(rms)
library(nnet)
AE 12: Exam 3 Review
Introduction
Packages
Data
As part of a study of the effects of predatory intertidal crab species on snail populations, researchers measured the mean closing forces and the propodus heights of the claws on several crabs of three species.
<- read_csv(here::here("ae", "data/claws.csv")) claws
We will use the following variables:
force
: Closing force of claw (newtons)height
: Propodus height (mm)species
: Crab species - Cp(Cancer productus), Hn (Hemigrapsus nudus), Lb(Lophopanopeus bellus)lb
: 1 if Lophopanopeus bellus species, 0 otherwisehn
: 1 if Hemigrapsus nudus species, 0 otherwisecp
: 1 if Cancer productus species, 0 otherwiseforce_cent
: mean centered forceheight_cent
: mean centered height
Before we get started, let’s make the categorical and indicator variables factors.
<- claws %>%
claws mutate(
species = as_factor(species),
lb = as_factor(lb),
hn = as_factor(hn),
cp = as_factor(cp)
)
Part 1
Probabilities vs. odds vs. log-odds
Exercise 1
Use Probabilities, odds, log-odds to fill in the blanks:
- Use ___ to fit the model (outcome)
- Use ___ to interpret model results
- Use ___ to make predictions for individual observations and ultimately to make classification decisions
Exercise 2
Suppose we want to use force to determine whether or not a crab is from the Lophopanopeus bellus (Lb) species. Why should we use a logistic regression model for this analysis?
%>%
claws distinct(lb)
# A tibble: 2 × 1
lb
<fct>
1 0
2 1
Exercise 3
We will use the mean-centered variables for force (force\_cent
) in the model. Use logistic regression to fit the model. Obtain its confidence interval and knit it to a table presenting only 3 digits. Write the equation of the model produced by R.
Let \(\pi\) be probability that a crab is from Lb species, write down the model formula based on your result.
Exercise 4
Interpret the intercept in the context of the data.
Exercise 5
Interpret the effect of force in the context of the data.
When x goes up by 1 unit, we expect y to change by (slope) units.
Exercise 6
Now let’s consider adding height_cent
to the model. Fit the model that includes height_cent
. Then use AIC to choose the model that best fits the data.
Exercise 7
What do the following mean in the context of this data. Explain and calculate them.
- Sensitivity: P(predict lb | actual lb) =
- Specificity: P(predict not lb | actual not lb) =
- Negative predictive power: P(actual not lb | predict not lb) =
- Positive predictive power: P(actual lb | predict lb) =
Actual | Predict lb | Predict not lb | TOTAL predicted |
---|---|---|---|
Lb | |||
Not lb | |||
TOTAL actual |
Part 2
Let’s extend the model to use force and height to predict the species (Hn, Cp, and Lb).
Exercise 8
Write the equation of the model.
\[\log\Big(\frac{\hat{\pi}_{Hn}}{\hat{\pi}_{Cp}}\Big) = \]
\[\log\Big(\frac{\hat{\pi}_{Lb}}{\hat{\pi}_{Cp}}\Big) = \]
Exercise 9
Interpret the intercept for the odds a crab is Hn vs. Cp species.
Interpret the effect of force on the odds a crab is Lb vs. Cp species.
Exercise 10
Interpret the effect of force on the odds a crab is in the Hn vs. Lb species.
CAUTION: We can write an interpretation based on the estimated coefficients; however, we can’t make any inferential conclusions for this question based on the current model. We would need to refit the model with Lb as the baseline category to do so.
Exercise 11
Conditions for multinomial logistic (and logistic models as well):
# add code here for other species here
# add code here for other species here
Part 3
Checking for multicollinearity in logistic and multinomial logistic
Similar to multiple linear regression, we can also check for multicollinearity in logistic and multinomial logistic models.
- Use the
vif
function to check for multicollinearity in logistic regression.
- The
vif
function doesn’t work for the multinomial logistic regression models, so we can look at a correlation matrix of the predictors as a way to assess if the predictors are highly correlated: