Clone the repository entitled “ae16-GitHubUsername” at course GitHub organization page on your RStudio.
Open the .Rmd
file and replace “Your Name” with your name.
What do we want to do?
Always ask
generate()
The function generate
allows three options for its type
argument. Discussion of type = bootstrap
, type = draw
, and type = permute
is available here.
type = permute
: shuffle the data without replacement
type = draw
: sample from a theoretical distribution
type = bootstrap
: re-sample the original data with replacement
First load the relevant packages:
library(tidyverse)
library(tidymodels)
Today’s dataset push_pull
comes from a “mini study” by mountain tactical institute.
push_pull <- read_csv("data/push_pull.csv")
push_pull %>%
slice(1:3, 24:26)
## # A tibble: 6 × 7
## participant_id age push1 push2 pull1 pull2 training
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 1 41 41 45 16 17 density
## 2 2 32 35 44 9 11 density
## 3 3 44 33 38 10 11 density
## 4 24 36 31 60 9 15 gtg
## 5 25 50 35 42 9 12 gtg
## 6 26 34 23 39 9 13 gtg
26 individuals completed 1 of 2 exercise regiments for 3.5 weeks to increase their push ups and pull ups. See the codebook below:
participant_id
: unique identifier for each participantage
: age of participantpush1
/ push2
: push-ups at beginning and end of program, respectivelypull1
/ pull2
: pull-ups at beginning and end of program, respectivelytraining
: which training protocol the individual participated in - either “density” or “gtg” (grease-the-groove)We create new variables for relative change in push-ups and pull-ups before and after the training. Recall, relative change = (new - old)/old.
push_pull <- push_pull %>%
mutate(rel_change_push = (push2 - push1)/push1,
rel_change_pull = (pull2 - pull1)/pull1)
In other words, we wonder if the group variable training
affects the average relative increase in pull-ups. Let \(\mu_{den}\) and \(\mu_{gtg}\) be the true average relative change in pull-ups among density trainees and gtg trainees, respectively.
Let’s perform a hypothesis testing.
Q - Step 1: State the null hypothesis and the alternative hypothesis both in words and math.
Q - Step 2: Find the relevant statistic from the data.
mu_hat_diff <- push_pull %>%
group_by(____) %>%
summarize(mu_hat = ____) %>%
pull(mu_hat) %>%
diff()
mu_hat_diff
Step 3: Simulate from the null distribution and compute the p-value.
Q - Which of the type
option in the generate()
function is the most appropriate in this case? Why?
Q - Complete the code chunk below to simulate 10,000 sample statistics under the null hypothesis. Hint: check the help page of calculate()
for its stat
argument.
set.seed(603)
null_dist <- push_pull %>%
specify() %>%
hypothesize() %>%
generate() %>%
calculate()
Q - Visualize the p-value region under the null hypothesis with informative labels and compute the p-value.
visualize(null_dist) +
labs(x = "_____",
y = "Count") +
shade_pvalue(obs_stat = _____, direction = _____)
pvalue1 <- null_dist %>%
get_pvalue(obs_stat = ______, direction = _____)
pvalue1
Q - Step 4: State your conclusion with \(\alpha = 0.01\).
Q - State the null hypothesis and the alternative hypothesis.
Q - Create a binary outcome over15
which takes TRUE if rel_change_push
is larger than 0.15 and FALSE otherwise.
Q - Find the sample proportion \(\hat{p}\).
p_hat
Q - Which of the type
option in the generate()
function is the most appropriate in this case? Why?
Q - Perform a hypothesis test, compute the p-value, and state your conclusion with \(\alpha = 0.05\).
set.seed(603)
null2_dist <- push_pull %>%
specify() %>%
hypothesize() %>%
generate() %>%
calculate()
visualize(null2_dist) +
labs(x = "Sample proportion of people with a 15% increase in push-ups",
y = "Count") +
shade_pvalue(obs_stat = _______, direction = _______)
pvalue2 <- null2_dist %>%
get_pvalue(obs_stat = _______, direction = _______)
pvalue2
Now we will construct a confidence interval to evaluate the hypotheses above.
Q - Which of the type
option in the generate()
function is the most appropriate in this case? Why?
Q - Simulate from a bootstrap distribution with reps = 10000
and visualize the distribution. What is it centered at?
set.seed(603)
Q - We want to construct a confidence interval at a confidence level equivalent to the significance level of \(\alpha = 0.05\). What do you think the confidence level should be? Hint: The alternative hypothesis is one-sided.
Q - Construct a confidence interval with the confidence level equivalent to \(\alpha = 0.05\). Interpret the confidence interval. Is the conclusion drawn from the confidence interval consistent with the conclusion from the hypothesis test?