Lab #08: Logistic Regression

due Friday, June 17 at 8:59am

Goals

Getting started

🛑 This assignment is due Friday, June 17 at 8:59am so that you are not stressed about submitting the lab during the final exam hours.

For each exercise:

Packages

We will use the tidyverse, tidymodels, knitr packages in this lab.

library(tidyverse)
theme_set(theme_bw())
library(tidymodels)
library(knitr)

Titanic

The wreck of the Titanic is one of the most infamous shipwrecks in history.

On April 15, 1912, the widely considered “unsinkable” ocean liner Titanic sank after striking an iceberg on its maiden voyage. Unfortunately, the ship did not carry enough lifeboats for everyone onboard, resulting in thousands of deaths.

The dataset titanic.csv contains the survival status and other attributes of a group of individuals on the Titanic.

titanic <- read_csv("data/titanic.csv")

We are interested in investigating if some groups of people were more likely to survive than others in this tragic accident.

  1. Make a new variable pclass_name in titanic which takes the value “1st” if pclass is 1, “2nd” if pclass is 2, and “3rd” if pclass is 3.

  2. Do the following exploratory data analysis and comment on what you observe.

  1. We will predict survival status based on sex, age, and class of passengers.
  1. Predict the probability of survival for the following new passengers using the augment() function. Hint: Make a new dataset for new passengers. Uncomment the code chunk below and complete it based on the table below to make the new dataset.
pclass sex age
1 female 17
2 female 16
3 female 18
1 male 9
1 male 23
1 male 52
3 male 18
# new_data <- tibble(
#   age = c(17, 16, 18, 9, 23, 52, 18),
#   sex = c("female", ...),
#   # either pclass or pclass_name
#   )
  1. Predict the probability of survival for passengers in the original data using the augment() function and create a new column survived_pred that takes a value 1 if the predicted probability is larger than some cutoff probability. Try three cutoff probabilities of 0.3, 0.5, and 0.8. Report sensitivity and specificity at each cutoff value. Suppose you were a decision maker during the incident of the Titanic and needed to send a limited number of lifeboats to rescue the new passengers in Ex 4 (you have information who is where in the ocean). Which cutoff would you prefer in this case and why? Hint: Sensitivity = P(labeled survived | survived), specificity = P(labeled not survived | not survived).

Submission

Grading (50 pts)


Component Points
Ex 1 3
Ex 2 15.5
Ex 3 15.5
Ex 4 4
Ex 5 7.5
Workflow & formatting 4.5

Grading notes: