AE 16: Hypothesis Testing 2

Getting Started

Inference Overview

Different Options Inside generate()

Push Ups and Pull Ups

Part 1: Is the average relative change in pull-ups of a gtg trainee significantly greater than a density trainee?
Part 2. Most people who train consistently will see at least a 15% increase in push-ups

Submitting Application Exercises

Getting Started

Clone the repository entitled “ae16-GitHubUsername” at course GitHub organization page on your RStudio.
Open the .Rmd file and replace “Your Name” with your name.

Inference Overview

What do we want to do?

Estimation Point estimate and confidence interval
Decision Hypothesis test

Always ask

How many variables?
What types of variables?
What is the research question?

Different Options Inside `generate()`

The function generate allows three options for its type argument. Discussion of type = bootstrap, type = draw, and type = permute is available here.

type = permute: shuffle the data without replacement
- for hypothesis testing (HT) on a difference in the outcome between groups
- example: HT for a difference in proportions of yawners in the treatment and the control group
type = draw: sample from a theoretical distribution
- only for HT on a single proportion
- example: HT for proportion of the number of heads in coin flips
type = bootstrap: re-sample the original data with replacement
- for confidence intervals (CI) or for HT on a single mean / median
- example: CI and HT for the true mean rent of one-bedroom apartments in Manhattan

Push Ups and Pull Ups

First load the relevant packages:

library(tidyverse)
library(tidymodels)

Today’s dataset push_pull comes from a “mini study” by mountain tactical institute.

push_pull <- read_csv("data/push_pull.csv")
push_pull %>%
  slice(1:3, 24:26)

## # A tibble: 6 × 7
##   participant_id   age push1 push2 pull1 pull2 training
##            <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>   
## 1              1    41    41    45    16    17 density 
## 2              2    32    35    44     9    11 density 
## 3              3    44    33    38    10    11 density 
## 4             24    36    31    60     9    15 gtg     
## 5             25    50    35    42     9    12 gtg     
## 6             26    34    23    39     9    13 gtg

26 individuals completed 1 of 2 exercise regiments for 3.5 weeks to increase their push ups and pull ups. See the codebook below:

participant_id: unique identifier for each participant
age: age of participant
push1 / push2: push-ups at beginning and end of program, respectively
pull1 / pull2: pull-ups at beginning and end of program, respectively
training: which training protocol the individual participated in - either “density” or “gtg” (grease-the-groove)

We create new variables for relative change in push-ups and pull-ups before and after the training. Recall, relative change = (new - old)/old.

push_pull <- push_pull %>%
  mutate(rel_change_push = (push2 - push1)/push1, 
         rel_change_pull = (pull2 - pull1)/pull1)

Part 1: Is the average relative change in pull-ups of a gtg trainee significantly greater than a density trainee?

In other words, we wonder if the group variable training affects the average relative increase in pull-ups. Let and be the true average relative change in pull-ups among density trainees and gtg trainees, respectively.

Let’s perform a hypothesis testing.

Q - Step 1: State the null hypothesis and the alternative hypothesis both in words and math.

Q - Step 2: Find the relevant statistic from the data.

mu_hat_diff <- push_pull %>% 
  group_by(____) %>% 
  summarize(mu_hat = ____) %>% 
  pull(mu_hat) %>% 
  diff()
mu_hat_diff

Step 3: Simulate from the null distribution and compute the p-value.

Q - Which of the type option in the generate() function is the most appropriate in this case? Why?

Q - Complete the code chunk below to simulate 10,000 sample statistics under the null hypothesis. Hint: check the help page of calculate() for its stat argument.

set.seed(603)

null_dist <- push_pull %>% 
  specify() %>%
  hypothesize() %>% 
  generate() %>%
  calculate()

Q - Visualize the p-value region under the null hypothesis with informative labels and compute the p-value.

visualize(null_dist) +
  labs(x = "_____", 
       y = "Count") +
  shade_pvalue(obs_stat = _____, direction = _____) 
  
pvalue1 <- null_dist %>%
  get_pvalue(obs_stat = ______, direction = _____) 
pvalue1

Q - Step 4: State your conclusion with .

Part 2. Most people who train consistently will see at least a 15% increase in push-ups

Q - State the null hypothesis and the alternative hypothesis.

Q - Create a binary outcome over15 which takes TRUE if rel_change_push is larger than 0.15 and FALSE otherwise.

Q - Find the sample proportion .

p_hat

Q - Which of the type option in the generate() function is the most appropriate in this case? Why?

Q - Perform a hypothesis test, compute the p-value, and state your conclusion with .

set.seed(603)

null2_dist <- push_pull %>% 
  specify() %>%
  hypothesize() %>% 
  generate() %>%
  calculate()

visualize(null2_dist) +
  labs(x = "Sample proportion of people with a 15% increase in push-ups", 
       y = "Count") +
  shade_pvalue(obs_stat = _______, direction = _______) 
  
pvalue2 <- null2_dist %>%
  get_pvalue(obs_stat = _______, direction = _______) 
pvalue2

Now we will construct a confidence interval to evaluate the hypotheses above.

Q - Which of the type option in the generate() function is the most appropriate in this case? Why?

Q - Simulate from a bootstrap distribution with reps = 10000 and visualize the distribution. What is it centered at?

set.seed(603)

Q - We want to construct a confidence interval at a confidence level equivalent to the significance level of . What do you think the confidence level should be? Hint: The alternative hypothesis is one-sided.

Q - Construct a confidence interval with the confidence level equivalent to . Interpret the confidence interval. Is the conclusion drawn from the confidence interval consistent with the conclusion from the hypothesis test?

Submitting Application Exercises

Once you have completed the activity, push your final changes to your GitHub repo.
Make sure you committed at least three times.
Check that your repo is updated on GitHub, and that’s all you need to do to submit application exercises for participation.

AE 16: Hypothesis Testing 2

due Monday, June 6 at 9:29am

Bora Jin

Getting Started

Inference Overview

Different Options Inside `generate()`

Push Ups and Pull Ups

Part 1: Is the average relative change in pull-ups of a gtg trainee significantly greater than a density trainee?

Part 2. Most people who train consistently will see at least a 15% increase in push-ups

Submitting Application Exercises

AE 16: Hypothesis Testing 2

due Monday, June 6 at 9:29am

Bora Jin

Getting Started

Inference Overview

Different Options Inside generate()

Push Ups and Pull Ups

Part 1: Is the average relative change in pull-ups of a gtg trainee significantly greater than a density trainee?

Part 2. Most people who train consistently will see at least a 15% increase in push-ups

Submitting Application Exercises

Different Options Inside `generate()`