Lab #06: Simulation-based Inference

due Tuesday, June 7 at 11:59pm

Goals

Getting started

For each exercise:

Three types for generate().

Packages

We will use the tidyverse and tidymodels packages in this lab.

library(tidyverse)
library(tidymodels)

Duke Lemurs

Today’s data come from the Duke Lemur center. We will examine a subset of the data and specifically focus on the following variables:

Click here for more info on the dataset including a codebook of variable names and taxonomic codes.

lemurs <- read_csv("data/lemur_subset.csv")

Hypothesis Testing for Difference Between Two Groups (i.e. Independence).

The idea is that you want to test whether or not a categorical variable (group variable) affects a numerical variable. Example: Does lemur taxonomy affect lifespan?

We want to test if mongoose lemurs have a greater median lifespan than the red-bellied lemurs.

Construct a hypothesis test to investigate the difference in median age of death between the two groups using age_at_death_y.

  1. State the null and alternative hypothesis mathematically and in words.

  2. Compute the sample statistic (what is the observed difference between the two groups?). Save this quantity as diff_med. Check the codebook to decode taxon names. You can remove NA observations.

  3. Filter your data frame to contain only the two taxa of lemurs you care about. Save this new data frame as lemurs2. Simulate under the null using the template code below.

set.seed(63)

null_diff_life <- lemurs2 %>%
   specify(response = ___, explanatory = ___) %>%
   hypothesize(null = ___) %>%
   generate(reps = 500, type = ___) %>% # change reps to 15000 for final version
   calculate(stat = ___, order = c("EMON", "ERUB")) # specifies order
  1. Compute the p-value using the get_pvalue() function. In narrative, use inline code to compare the p-value with \(\alpha = 0.05\) and make a conclusion. Be sure to state your conclusion in context.

🧶 ✅ ⬆️ Knit, commit, and push your final changes to GitHub with a meaningful commit message.

Hypothesis Testing About a Proportion

According to Duke’s lemur center 75% of breeding occurs during October and November. Since gestation lasts about 4.5 months, one might expect 75% of births to occur in March and April.

Do you believe the proportion of births in these two months is significantly different from the expectation?

  1. As above, setup a hypothesis test to investigate, following each step below.

🧶 ✅ ⬆️ Knit, commit, and push your final changes to GitHub with a meaningful commit message.

Hypothesis Testing and Confidence Interval About a Point Estimate

According to the Smithsonian’s National Zoo, the average weight of adult male ring-tailed lemurs is about 3 kilograms, and females are usually smaller.

Is adult, female, ring-tailed lemur weight significantly less than 3kg (3000g)?

  1. Setup a hypothesis test to investigate, following each step below.

🧶 ✅ ⬆️ Knit, commit, and push your final changes to GitHub with a meaningful commit message.

  1. Construct a confidence interval with the confidence level equivalent to \(\alpha = 0.05\). Interpret the confidence interval. Use the same seed number as Ex 6. Is the conclusion drawn from the confidence interval consistent with the conclusion from the hypothesis test in Ex 6?

🧶 ✅ ⬆️ Knit, commit, and push your final changes to GitHub with a meaningful commit message.

Submission

Grading (50 pts)


Component Points
Ex 1 4
Ex 2 4
Ex 3 4
Ex 4 4
Ex 5 10
Ex 6 13
Ex 7 5
Workflow & formatting 6

Grading notes: