generate()Clone the repository entitled “ae16-GitHubUsername” at course GitHub organization page on your RStudio.
Open the .Rmd file and replace “Your Name” with your name.
What do we want to do?
Always ask
generate()The function generate allows three options for its type argument. Discussion of type = bootstrap, type = draw, and type = permute is available here.
type = permute: shuffle the data without replacement
type = draw: sample from a theoretical distribution
type = bootstrap: re-sample the original data with replacement
First load the relevant packages:
library(tidyverse)
library(tidymodels)
Today’s dataset push_pull comes from a “mini study” by mountain tactical institute.
push_pull <- read_csv("data/push_pull.csv")
push_pull %>%
slice(1:3, 24:26)
## # A tibble: 6 × 7
## participant_id age push1 push2 pull1 pull2 training
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 1 41 41 45 16 17 density
## 2 2 32 35 44 9 11 density
## 3 3 44 33 38 10 11 density
## 4 24 36 31 60 9 15 gtg
## 5 25 50 35 42 9 12 gtg
## 6 26 34 23 39 9 13 gtg
26 individuals completed 1 of 2 exercise regiments for 3.5 weeks to increase their push ups and pull ups. See the codebook below:
participant_id: unique identifier for each participantage: age of participantpush1 / push2: push-ups at beginning and end of program, respectivelypull1 / pull2: pull-ups at beginning and end of program, respectivelytraining: which training protocol the individual participated in - either “density” or “gtg” (grease-the-groove)We create new variables for relative change in push-ups and pull-ups before and after the training. Recall, relative change = (new - old)/old.
push_pull <- push_pull %>%
mutate(rel_change_push = (push2 - push1)/push1,
rel_change_pull = (pull2 - pull1)/pull1)
In other words, we wonder if the group variable training affects the average relative increase in pull-ups. Let μden and μgtg be the true average relative change in pull-ups among density trainees and gtg trainees, respectively.
Let’s perform a hypothesis testing.
Q - Step 1: State the null hypothesis and the alternative hypothesis both in words and math.
Q - Step 2: Find the relevant statistic from the data.
mu_hat_diff <- push_pull %>%
group_by(____) %>%
summarize(mu_hat = ____) %>%
pull(mu_hat) %>%
diff()
mu_hat_diff
Step 3: Simulate from the null distribution and compute the p-value.
Q - Which of the type option in the generate() function is the most appropriate in this case? Why?
Q - Complete the code chunk below to simulate 10,000 sample statistics under the null hypothesis. Hint: check the help page of calculate() for its stat argument.
set.seed(603)
null_dist <- push_pull %>%
specify() %>%
hypothesize() %>%
generate() %>%
calculate()
Q - Visualize the p-value region under the null hypothesis with informative labels and compute the p-value.
visualize(null_dist) +
labs(x = "_____",
y = "Count") +
shade_pvalue(obs_stat = _____, direction = _____)
pvalue1 <- null_dist %>%
get_pvalue(obs_stat = ______, direction = _____)
pvalue1
Q - Step 4: State your conclusion with α=0.01.
Q - State the null hypothesis and the alternative hypothesis.
Q - Create a binary outcome over15 which takes TRUE if rel_change_push is larger than 0.15 and FALSE otherwise.
Q - Find the sample proportion ˆp.
p_hat
Q - Which of the type option in the generate() function is the most appropriate in this case? Why?
Q - Perform a hypothesis test, compute the p-value, and state your conclusion with α=0.05.
set.seed(603)
null2_dist <- push_pull %>%
specify() %>%
hypothesize() %>%
generate() %>%
calculate()
visualize(null2_dist) +
labs(x = "Sample proportion of people with a 15% increase in push-ups",
y = "Count") +
shade_pvalue(obs_stat = _______, direction = _______)
pvalue2 <- null2_dist %>%
get_pvalue(obs_stat = _______, direction = _______)
pvalue2
Now we will construct a confidence interval to evaluate the hypotheses above.
Q - Which of the type option in the generate() function is the most appropriate in this case? Why?
Q - Simulate from a bootstrap distribution with reps = 10000 and visualize the distribution. What is it centered at?
set.seed(603)
Q - We want to construct a confidence interval at a confidence level equivalent to the significance level of α=0.05. What do you think the confidence level should be? Hint: The alternative hypothesis is one-sided.
Q - Construct a confidence interval with the confidence level equivalent to α=0.05. Interpret the confidence interval. Is the conclusion drawn from the confidence interval consistent with the conclusion from the hypothesis test?