AE 19: More Practice

Package

Part 1: Pivoting

Practice¹

Part 2: Probability

Part 3: Inference based on Simulations or Central Limit Theorem

Data
Exercises

Submitting Application Exercises

Package

library(tidyverse)

Part 1: Pivoting

pivot_longer() “lengthens” data, increasing the number of rows and decreasing the number of columns.

data_left %>% 
  pivot_longer(col = Y1:Y3, 
               names_to = "Y", 
               values_to = "Z")

pivot_wider() “widens” data, increasing the number of columns and decreasing the number of rows.

data_right %>% 
  pivot_wider(names_from = Y, 
              values_from = Z)

Practice¹

Transform the dataset relig_income to one with columns religion, income, and count using either pivot_wider() or pivot_longer().

relig_income %>% 
  head()

## # A tibble: 6 × 11
##   religion  `<$10k` `$10-20k` `$20-30k` `$30-40k` `$40-50k` `$50-75k` `$75-100k`
##   <chr>       <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>      <dbl>
## 1 Agnostic       27        34        60        81        76       137        122
## 2 Atheist        12        27        37        52        35        70         73
## 3 Buddhist       27        21        30        34        33        58         62
## 4 Catholic      418       617       732       670       638      1116        949
## 5 Don’t kn…      15        14        15        11        10        35         21
## 6 Evangeli…     575       869      1064       982       881      1486        949
## # … with 3 more variables: `$100-150k` <dbl>, `>150k` <dbl>,
## #   `Don't know/refused` <dbl>

Transform the dataset fish_encounters to one with a matrix structure where rows indicate fish, columns indicate station, and each cell indicates whether each fish was seen in a certain station (1 if yes and 0 otherwise). Use either pivot_wider() or pivot_longer().

fish_encounters %>% 
  head()

## # A tibble: 6 × 3
##   fish  station  seen
##   <fct> <fct>   <int>
## 1 4842  Release     1
## 2 4842  I80_1       1
## 3 4842  Lisbon      1
## 4 4842  Rstr        1
## 5 4842  Base_TD     1
## 6 4842  BCE         1

Transform the dataset billboard to one with columns artist, track, data.entered, week, and rank. Drop NA values for rank. Use pivot_wider() or pivot_longer().

billboard %>% 
  head()

## # A tibble: 6 × 79
##   artist      track date.entered   wk1   wk2   wk3   wk4   wk5   wk6   wk7   wk8
##   <chr>       <chr> <date>       <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2 Pac       Baby… 2000-02-26      87    82    72    77    87    94    99    NA
## 2 2Ge+her     The … 2000-09-02      91    87    92    NA    NA    NA    NA    NA
## 3 3 Doors Do… Kryp… 2000-04-08      81    70    68    67    66    57    54    53
## 4 3 Doors Do… Loser 2000-10-21      76    76    72    69    67    65    55    59
## 5 504 Boyz    Wobb… 2000-04-15      57    34    25    17    17    31    36    49
## 6 98^0        Give… 2000-08-19      51    39    34    26    26    19     2     2
## # … with 68 more variables: wk9 <dbl>, wk10 <dbl>, wk11 <dbl>, wk12 <dbl>,
## #   wk13 <dbl>, wk14 <dbl>, wk15 <dbl>, wk16 <dbl>, wk17 <dbl>, wk18 <dbl>,
## #   wk19 <dbl>, wk20 <dbl>, wk21 <dbl>, wk22 <dbl>, wk23 <dbl>, wk24 <dbl>,
## #   wk25 <dbl>, wk26 <dbl>, wk27 <dbl>, wk28 <dbl>, wk29 <dbl>, wk30 <dbl>,
## #   wk31 <dbl>, wk32 <dbl>, wk33 <dbl>, wk34 <dbl>, wk35 <dbl>, wk36 <dbl>,
## #   wk37 <dbl>, wk38 <dbl>, wk39 <dbl>, wk40 <dbl>, wk41 <dbl>, wk42 <dbl>,
## #   wk43 <dbl>, wk44 <dbl>, wk45 <dbl>, wk46 <dbl>, wk47 <dbl>, wk48 <dbl>, …

Part 2: Probability

Try exercises originally developed by Prof. Alexander Fisher and ask me questions.

Part 3: Inference based on Simulations or Central Limit Theorem

This is originally developed by Prof. Becky Tang and has been slightly modified. Click here to see the original homework.

Data

Postoperative sore throat is an annoying complication of intubation after surgery, particularly with wider gauge double-lumen tubes. Reutzler et al. (2013) performed an experimental study in Germany among patients having elective surgery who required intubation with a double-lumen tube. Prior to anesthesia, patients were randomly assigned to gargle either a licorice-based solution or sugar water (as placebo).

Sore throat was evaluated 30 minutes, 90 minutes, and 4 hours after conclusion of the surgery, evaluated using a numeric scale from 0 to 10, where 0 = no pain and 10 = worst pain. For the purposes of this assignment, we will treat these pain scales as numeric.

The data are available in your assignment repository as a .csv file. Some relevant variables of interest are:

preOp_gender: Gender (0 = Male; 1 = Female)
preOp_calcBMI: Body mass index in kg/m
preOp_asa: American Society of Anesthesiologists physical status classification (1 = normal healthy patient, 2 = mild systemic disease, 3 = severe systemic disease)
treat: Treatment given (0 = Sugar placebo; 1 = Licorice solution)
pacu30min_throatPain: Sore throat pain score 30 minutes after arrival in the post-anesthesia care unit (PACU)

Exercises

Overall hint: When performing a hypothesis test, you must provide the significance level of your test, the null and alternative hypotheses, the p-value, your decision, and an interpretation of the p-value in context of the original research question. If you are using a non-simulation-based approach, you must also provide the value of your test statistic and the distribution of that test statistic assuming the null hypothesis is true.

Overall hint: To ensure reproducibility, for all exercises requiring a simulation-based approach, set a seed of your choice. Additionally, ensure that the number of repetitions is sufficiently large.

Be careful with missing values of the variables you’re analyzing in each question!

Construct and interpret a 95% confidence interval for the mean sore throat pain score 30 minutes after arrival in the PACU among all patients using both a simulation-based approach and a CLT-based approach. Compare these two intervals.
Suppose that these patients are representative of German patients undergoing surgeries that require intubation. Is there evidence that the mean BMI among such patients differs from the mean BMI among all German adults of 26 kg/m? Assess this hypothesis using a simulation-based approach. Provide a visualization of your simulated null distribution and observed data (sample statistic).

Now, let’s examine any potential effects of licorice solution on reducing throat pain after surgery.

Assess whether there was a lower mean throat pain score 30 minutes after surgery among patients who received licorice compared to patients who received sugar solution placebo. Use a simulation-based approach.
Comprehensively assess whether a lower proportion of patients who received licorice solution reported having any pain 30 minutes after surgery compared to sugar solution. Use a simulation-based approach.
Based on your analyses, do you think that licorice gargle prior to surgery is effective in reducing post-intubation sore throat? Explain your answer, referencing any data, formal statistical tests, or study design as necessary.

In Exercises 6 - 10, determine whether the statements are TRUE or FALSE. If the statement is FALSE, explain why it is FALSE.

The mean BMI among patients receiving licorice solution was 25.6 kg/m and the mean BMI among patients receiving sugar solution placebo was 25.6 kg/m. In assessing whether there is a difference in mean BMI between the two treatment groups using a CLT-based approach, the researchers obtained a p-value of 0.925.

If there is truly no difference in mean BMI between these two groups, then the probability of seeing a difference in BMI as large as our observed difference or even larger is approximately 0.925.
Assuming , then our p-value of 0.925 would be strong evidence that there is no difference in the mean BMI between the two treatment groups.
The probability that we have made a Type 2 error is less than 10%.
If we were to repeatedly construct 95% confidence intervals for the difference in mean BMI in the same way from the original population, then we know that 95% of those intervals would truly contain the true population difference in means.
If we instead found a p-value of 0.021, then at the level, we would have enough evidence to conclude that there is a difference in mean BMI between the two treatment groups.

Submitting Application Exercises

No need to submit anything

Source: https://tidyr.tidyverse.org/reference/pivot_wider.html, https://tidyr.tidyverse.org/reference/pivot_longer.html ↩︎