lab06.Rmd
to open the template R Markdown file.For each exercise:
Show all relevant code and output used to obtain your response.
Write all narrative in complete sentences, and use clear axis labels and titles on visualizations.
Use a small number of reps
(about 500) as you write and test out your code. Once you have finalized all of your code, increase the number of reps
to 15,000 to produce your final results.
For each simulation exercise, use the seed specified in the exercise instructions.
generate()
.type = bootstrap
: A bootstrap sample will be drawn for each replicate, where a sample of size equal to the input sample size is drawn (with replacement) from the input sample data.
type = permute
: For each replicate, each input value will be randomly reassigned (without replacement) to a new output value in the sample.
type = draw
: A value will be sampled from a theoretical distribution with parameters specified in hypothesize()
for each replicate. This option is currently only applicable for testing point estimates.
We will use the tidyverse and tidymodels packages in this lab.
library(tidyverse)
library(tidymodels)
Today’s data come from the Duke Lemur center. We will examine a subset of the data and specifically focus on the following variables:
taxon
: the specific lemur taxonage_at_death_y
: age of lemur at deathbirth_month
: month the lemur was bornsex
: whether the lemur is male or femaleweight_g
: weight of lemur in gramsage_category
: age category (IJ: infant or juvenile, young_adult, adult)Click here for more info on the dataset including a codebook of variable names and taxonomic codes.
<- read_csv("data/lemur_subset.csv") lemurs
The idea is that you want to test whether or not a categorical variable (group variable) affects a numerical variable. Example: Does lemur taxonomy affect lifespan?
We want to test if mongoose lemurs have a greater median lifespan than the red-bellied lemurs.
Construct a hypothesis test to investigate the difference in median age of death between the two groups using age_at_death_y
.
State the null and alternative hypothesis mathematically and in words.
Compute the sample statistic (what is the observed difference between the two groups?). Save this quantity as diff_med
. Check the codebook to decode taxon names. You can remove NA
observations.
Filter your data frame to contain only the two taxa of lemurs you care about. Save this new data frame as lemurs2
. Simulate under the null using the template code below.
reponse
is the dependent variable while explanatory is the independent variable. Think about the prompt above: “does lemur taxonomy affect lifespan?”eval = FALSE
.set.seed(63)
<- lemurs2 %>%
null_diff_life specify(response = ___, explanatory = ___) %>%
hypothesize(null = ___) %>%
generate(reps = 500, type = ___) %>% # change reps to 15000 for final version
calculate(stat = ___, order = c("EMON", "ERUB")) # specifies order
get_pvalue()
function. In narrative, use inline code to compare the p-value with \(\alpha = 0.05\) and make a conclusion. Be sure to state your conclusion in context.🧶 ✅ ⬆️ Knit, commit, and push your final changes to GitHub with a meaningful commit message.
According to Duke’s lemur center 75% of breeding occurs during October and November. Since gestation lasts about 4.5 months, one might expect 75% of births to occur in March and April.
Do you believe the proportion of births in these two months is significantly different from the expectation?
Step 1: State the null and alternative hypothesis.
Step 2: Compute the observed statistic. Hint: first mutate a new variable birth_3or4
to be TRUE
if birth month is 3 or 4 and FALSE
otherwise. Save your mutated variable in lemurs3
.
Step 3: Simulate the null distribution, specify the response
to be the mutated variable you created above and success = "TRUE"
. Use the seed number 65.
Step 4: Compute p-value and compare to \(\alpha = 0.05\). Write your conclusion in context.
🧶 ✅ ⬆️ Knit, commit, and push your final changes to GitHub with a meaningful commit message.
According to the Smithsonian’s National Zoo, the average weight of adult male ring-tailed lemurs is about 3 kilograms, and females are usually smaller.
Is adult, female, ring-tailed lemur weight significantly less than 3kg (3000g)?
Step 1: State the null and alternative hypothesis.
Step 2: Filter your data frame to contain only the information you care about. Save this new data frame as lemurs4
. Compute the observed statistic.
Step 3: Simulate the null distribution using the seed number 66.
Step 4: Visualize the null distribution with a p-value region shaded. Choose an informative label on the \(x\)-axis (e.g., sample proportion, sample median, etc.). Hint: use visualize()
and shade_pvalue()
. Compute p-value and compare to \(\alpha = 0.05\). Write your conclusion.
🧶 ✅ ⬆️ Knit, commit, and push your final changes to GitHub with a meaningful commit message.
🧶 ✅ ⬆️ Knit, commit, and push your final changes to GitHub with a meaningful commit message.
Component | Points |
---|---|
Ex 1 | 4 |
Ex 2 | 4 |
Ex 3 | 4 |
Ex 4 | 4 |
Ex 5 | 10 |
Ex 6 | 13 |
Ex 7 | 5 |
Workflow & formatting | 6 |
Grading notes: