STA 210 - Summer 2022

Yunran Chen

Exam 2 scores for part 1 are posted

Project proposals + AE 9 due Wednesday, June 8, at 11:59pm

- Conceptual part
- Applied part

Use the odds ratio to compare the odds of two groups

Interpret the coefficients of a logistic regression model with

- a single categorical predictor
- a single quantitative predictor
- multiple predictors

This dataset is from an ongoing cardiovascular study on residents of the town of Framingham, Massachusetts. We want to examine the relationship between various health characteristics and the risk of having heart disease.

`high_risk`

:- 1: High risk of having heart disease in next 10 years
- 0: Not high risk of having heart disease in next 10 years

`age`

: Age at exam time (in years)`education`

: 1 = Some High School, 2 = High School or GED, 3 = Some College or Vocational School, 4 = College

Education | High risk | Not high risk |
---|---|---|

Some high school | 323 | 1397 |

High school or GED | 147 | 1106 |

Some college or vocational school | 88 | 601 |

College | 70 | 403 |

Education | High risk | Not high risk |
---|---|---|

Some high school | 323 | 1397 |

High school or GED | 147 | 1106 |

Some college or vocational school | 88 | 601 |

College | 70 | 403 |

We want to compare the risk of heart disease for those with a High School diploma/GED and those with a college degree.

We’ll use the

**odds**to compare the two groups

\[ \text{odds} = \frac{P(\text{success})}{P(\text{failure})} = \frac{\text{# of successes}}{\text{# of failures}} \]

Education | High risk | Not high risk |
---|---|---|

Some high school | 323 | 1397 |

High school or GED | 147 | 1106 |

Some college or vocational school | 88 | 601 |

College | 70 | 403 |

Odds of having high risk for the

**High school or GED**group: \(\frac{147}{1106} = 0.133\)Odds of having high risk for the

**College**group: \(\frac{70}{403} = 0.174\)Based on this, we see those with a college degree had higher odds of having high risk for heart disease than those with a high school diploma or GED.

Education | High risk | Not high risk |
---|---|---|

Some high school | 323 | 1397 |

High school or GED | 147 | 1106 |

Some college or vocational school | 88 | 601 |

College | 70 | 403 |

Let’s summarize the relationship between the two groups. To do so, we’ll use the **odds ratio (OR)**.

\[ OR = \frac{\text{odds}_1}{\text{odds}_2} = \frac{\omega_1}{\omega_2} \]

Education | High risk | Not high risk |
---|---|---|

Some high school | 323 | 1397 |

High school or GED | 147 | 1106 |

Some college or vocational school | 88 | 601 |

College | 70 | 403 |

\[OR = \frac{\text{odds}_{College}}{\text{odds}_{HS}} = \frac{0.174}{0.133} = \mathbf{1.308}\]

The odds of having high risk for heart disease are 1.30 times higher for those with a college degree than those with a high school diploma or GED.

Education | High risk | Not high risk |
---|---|---|

Some high school | 323 | 1397 |

High school or GED | 147 | 1106 |

Some college or vocational school | 88 | 601 |

College | 70 | 403 |

\[OR = \frac{\text{odds}_{College}}{\text{odds}_{Some HS}} = \frac{70/403}{323/1397} = 0.751\]

The odds of having high risk for having heart disease for those with a college degree are 0.751 times the odds of having high risk for heart disease for those with some high school.

It’s more natural to interpret the odds ratio with a statement with the odds ratio greater than 1.

The odds of having high risk for heart disease are 1.33 times higher for those with some high school than those with a college degree.

First, rename the levels of the categorical variables:

```
heart_disease <- heart_disease %>%
mutate(
high_risk_names = if_else(high_risk == "1", "High risk", "Not high risk"),
education_names = case_when(
education == "1" ~ "Some high school",
education == "2" ~ "High school or GED",
education == "3" ~ "Some college or vocational school",
education == "4" ~ "College"
),
education_names = fct_relevel(education_names, "Some high school", "High school or GED", "Some college or vocational school", "College")
)
```

Then, make the table:

```
# A tibble: 8 × 3
education_names high_risk_names n
<fct> <chr> <int>
1 Some high school High risk 323
2 Some high school Not high risk 1397
3 High school or GED High risk 147
4 High school or GED Not high risk 1106
5 Some college or vocational school High risk 88
6 Some college or vocational school Not high risk 601
7 College High risk 70
8 College Not high risk 403
```

```
heart_disease %>%
count(education_names, high_risk_names) %>%
pivot_wider(names_from = high_risk_names, values_from = n)
```

```
# A tibble: 4 × 3
education_names `High risk` `Not high risk`
<fct> <int> <int>
1 Some high school 323 1397
2 High school or GED 147 1106
3 Some college or vocational school 88 601
4 College 70 403
```

```
heart_disease %>%
count(education_names, high_risk_names) %>%
pivot_wider(names_from = high_risk_names, values_from = n) %>%
kable(col.names = c("Education", "High risk", "Not high risk"))
```

Education | High risk | Not high risk |
---|---|---|

Some high school | 323 | 1397 |

High school or GED | 147 | 1106 |

Some college or vocational school | 88 | 601 |

College | 70 | 403 |

Recall: Education - 1 = Some High School, 2 = High School or GED, 3 = Some College or Vocational School, 4 = College

`education4`

- log-oddsterm | estimate | std.error | statistic | p.value |
---|---|---|---|---|

(Intercept) | -1.464 | 0.062 | -23.719 | 0.000 |

education2 | -0.554 | 0.107 | -5.159 | 0.000 |

education3 | -0.457 | 0.130 | -3.520 | 0.000 |

education4 | -0.286 | 0.143 | -1.994 | 0.046 |

The **log-odds** of having high risk for heart disease are expected to be 0.286 less for those with a college degree compared to those with some high school (the baseline group).

`education4`

- oddsterm | estimate | std.error | statistic | p.value |
---|---|---|---|---|

(Intercept) | -1.464 | 0.062 | -23.719 | 0.000 |

education2 | -0.554 | 0.107 | -5.159 | 0.000 |

education3 | -0.457 | 0.130 | -3.520 | 0.000 |

education4 | -0.286 | 0.143 | -1.994 | 0.046 |

The **odds** of having high risk for heart disease for those with a college degree are expected to be 0.751 (exp(-0.286)) **times** the odds for those with some high school.

The model coefficient, -0.286, is the expected change in the log-odds when going from the *Some high school* group to the *College* group.

Therefore, \(e^{-0.286}\) = 0.751 is the expected change in the **odds** when going from the *Some high school* group to the *College* group.

\[ OR = e^{\hat{\beta}_j} = \exp\{\hat{\beta}_j\} \]

`age`

: log-oddsterm | estimate | std.error | statistic | p.value |
---|---|---|---|---|

(Intercept) | -5.619 | 0.288 | -19.498 | 0 |

age | 0.076 | 0.005 | 14.174 | 0 |

For each additional year in age, the log-odds of having high risk for heart disease are expected to increase by 0.076.

`age`

: oddsterm | estimate | std.error | statistic | p.value |
---|---|---|---|---|

(Intercept) | -5.619 | 0.288 | -19.498 | 0 |

age | 0.076 | 0.005 | 14.174 | 0 |

- For each additional year in age, the odds of having high risk for heart disease are expected to multiply by a factor of 1.08 (
`exp(0.076)`

). **Alternate interpretation:**For each additional year in age, the odds of having high risk for heart disease are expected to increase by 8%.

term | estimate | std.error | statistic | p.value |
---|---|---|---|---|

(Intercept) | -5.385 | 0.308 | -17.507 | 0.000 |

education2 | -0.242 | 0.112 | -2.162 | 0.031 |

education3 | -0.235 | 0.134 | -1.761 | 0.078 |

education4 | -0.020 | 0.148 | -0.136 | 0.892 |

age | 0.073 | 0.005 | 13.385 | 0.000 |

`education4`

: The **log-odds** of having high risk for heart disease are expected to be 0.020 less for those with a college degree compared to those with some high school, **holding age constant.**

term | estimate | std.error | statistic | p.value |
---|---|---|---|---|

(Intercept) | -5.385 | 0.308 | -17.507 | 0.000 |

education2 | -0.242 | 0.112 | -2.162 | 0.031 |

education3 | -0.235 | 0.134 | -1.761 | 0.078 |

education4 | -0.020 | 0.148 | -0.136 | 0.892 |

age | 0.073 | 0.005 | 13.385 | 0.000 |

`age`

: For each additional year in age, the log-odds of having high risk for heart disease are expected to increase by 0.073, **holding education level constant.**

term | estimate | std.error | statistic | p.value |
---|---|---|---|---|

(Intercept) | -5.385 | 0.308 | -17.507 | 0.000 |

education2 | -0.242 | 0.112 | -2.162 | 0.031 |

education3 | -0.235 | 0.134 | -1.761 | 0.078 |

education4 | -0.020 | 0.148 | -0.136 | 0.892 |

age | 0.073 | 0.005 | 13.385 | 0.000 |

`education4`

: The **odds** of having high risk for heart disease for those with a college degree are expected to be 0.98 (`exp(-0.020)`

) **times** the odds for those with some high school, **holding age constant**.

term | estimate | std.error | statistic | p.value |
---|---|---|---|---|

(Intercept) | -5.385 | 0.308 | -17.507 | 0.000 |

education2 | -0.242 | 0.112 | -2.162 | 0.031 |

education3 | -0.235 | 0.134 | -1.761 | 0.078 |

education4 | -0.020 | 0.148 | -0.136 | 0.892 |

age | 0.073 | 0.005 | 13.385 | 0.000 |

`age`

: For each additional year in age, the odds having high risk for heart disease are expected to multiply by a factor of 1.08 (`exp(0.073)`

), **holding education level constant**.

Use the odds ratio to compare the odds of two groups

Interpret the coefficients of a logistic regression model with

- a single categorical predictor
- a single quantitative predictor
- multiple predictors