Important

Go to the course GitHub organization and locate the repo titled ae-12-exam-3-review-YOUR_GITHUB_USERNAME to get started.

# Introduction

## Packages

library(tidyverse)
library(tidymodels)
library(knitr)
library(Stat2Data)
library(rms)
library(nnet)

## Data

As part of a study of the effects of predatory intertidal crab species on snail populations, researchers measured the mean closing forces and the propodus heights of the claws on several crabs of three species.

claws <- read_csv(here::here("ae", "data/claws.csv"))

We will use the following variables:

• force: Closing force of claw (newtons)
• height: Propodus height (mm)
• species: Crab species - Cp(Cancer productus), Hn (Hemigrapsus nudus), Lb(Lophopanopeus bellus)
• lb: 1 if Lophopanopeus bellus species, 0 otherwise
• hn: 1 if Hemigrapsus nudus species, 0 otherwise
• cp: 1 if Cancer productus species, 0 otherwise
• force_cent: mean centered force
• height_cent: mean centered height

Before we get started, let’s make the categorical and indicator variables factors.

claws <- claws %>%
mutate(
species = as_factor(species),
lb = as_factor(lb),
hn = as_factor(hn),
cp = as_factor(cp)
)

# Part 1

## Exercise 1

Use Probabilities, odds, log-odds to fill in the blanks:

• Use ___ to fit the model (outcome)
• Use ___ to interpret model results
• Use ___ to make predictions for individual observations and ultimately to make classification decisions

## Exercise 2

Suppose we want to use force to determine whether or not a crab is from the Lophopanopeus bellus (Lb) species. Why should we use a logistic regression model for this analysis?

claws %>%
distinct(lb)
# A tibble: 2 × 1
lb
<fct>
1 0
2 1    

## Exercise 3

We will use the mean-centered variables for force (force\_cent) in the model. Use logistic regression to fit the model. Obtain its confidence interval and knit it to a table presenting only 3 digits. Write the equation of the model produced by R.

Let $$\pi$$ be probability that a crab is from Lb species, write down the model formula based on your result.

## Exercise 4

Interpret the intercept in the context of the data.

## Exercise 5

Interpret the effect of force in the context of the data.

When x goes up by 1 unit, we expect y to change by (slope) units.

## Exercise 6

Now let’s consider adding height_cent to the model. Fit the model that includes height_cent. Then use AIC to choose the model that best fits the data.

## Exercise 7

What do the following mean in the context of this data. Explain and calculate them.

• Sensitivity: P(predict lb | actual lb) =
• Specificity: P(predict not lb | actual not lb) =
• Negative predictive power: P(actual not lb | predict not lb) =
• Positive predictive power: P(actual lb | predict lb) =
Actual Predict lb Predict not lb TOTAL predicted
Lb
Not lb
TOTAL actual

# Part 2

Let’s extend the model to use force and height to predict the species (Hn, Cp, and Lb).

## Exercise 8

Write the equation of the model.

$\log\Big(\frac{\hat{\pi}_{Hn}}{\hat{\pi}_{Cp}}\Big) =$

$\log\Big(\frac{\hat{\pi}_{Lb}}{\hat{\pi}_{Cp}}\Big) =$

## Exercise 9

• Interpret the intercept for the odds a crab is Hn vs. Cp species.

• Interpret the effect of force on the odds a crab is Lb vs. Cp species.

## Exercise 10

Interpret the effect of force on the odds a crab is in the Hn vs. Lb species.

CAUTION: We can write an interpretation based on the estimated coefficients; however, we can’t make any inferential conclusions for this question based on the current model. We would need to refit the model with Lb as the baseline category to do so.

## Exercise 11

Conditions for multinomial logistic (and logistic models as well):

# add code here for other species here
# add code here for other species here

# Part 3

## Checking for multicollinearity in logistic and multinomial logistic

Similar to multiple linear regression, we can also check for multicollinearity in logistic and multinomial logistic models.

• Use the vif function to check for multicollinearity in logistic regression.
• The vif function doesn’t work for the multinomial logistic regression models, so we can look at a correlation matrix of the predictors as a way to assess if the predictors are highly correlated: