library(tidyverse)
pivot_longer()
“lengthens” data, increasing the number of rows and decreasing the number of columns.data_left %>%
pivot_longer(col = Y1:Y3,
names_to = "Y",
values_to = "Z")
pivot_wider()
“widens” data, increasing the number of columns and decreasing the number of rows.data_right %>%
pivot_wider(names_from = Y,
values_from = Z)
relig_income
to one with columns religion
, income
, and count
using either pivot_wider()
or pivot_longer()
.relig_income %>%
head()
## # A tibble: 6 × 11
## religion `<$10k` `$10-20k` `$20-30k` `$30-40k` `$40-50k` `$50-75k` `$75-100k`
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Agnostic 27 34 60 81 76 137 122
## 2 Atheist 12 27 37 52 35 70 73
## 3 Buddhist 27 21 30 34 33 58 62
## 4 Catholic 418 617 732 670 638 1116 949
## 5 Don’t kn… 15 14 15 11 10 35 21
## 6 Evangeli… 575 869 1064 982 881 1486 949
## # … with 3 more variables: `$100-150k` <dbl>, `>150k` <dbl>,
## # `Don't know/refused` <dbl>
fish_encounters
to one with a matrix structure where rows indicate fish, columns indicate station, and each cell indicates whether each fish was seen in a certain station (1 if yes and 0 otherwise). Use either pivot_wider()
or pivot_longer()
.fish_encounters %>%
head()
## # A tibble: 6 × 3
## fish station seen
## <fct> <fct> <int>
## 1 4842 Release 1
## 2 4842 I80_1 1
## 3 4842 Lisbon 1
## 4 4842 Rstr 1
## 5 4842 Base_TD 1
## 6 4842 BCE 1
billboard
to one with columns artist
, track
, data.entered
, week
, and rank
. Drop NA
values for rank
. Use pivot_wider()
or pivot_longer()
.billboard %>%
head()
## # A tibble: 6 × 79
## artist track date.entered wk1 wk2 wk3 wk4 wk5 wk6 wk7 wk8
## <chr> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2 Pac Baby… 2000-02-26 87 82 72 77 87 94 99 NA
## 2 2Ge+her The … 2000-09-02 91 87 92 NA NA NA NA NA
## 3 3 Doors Do… Kryp… 2000-04-08 81 70 68 67 66 57 54 53
## 4 3 Doors Do… Loser 2000-10-21 76 76 72 69 67 65 55 59
## 5 504 Boyz Wobb… 2000-04-15 57 34 25 17 17 31 36 49
## 6 98^0 Give… 2000-08-19 51 39 34 26 26 19 2 2
## # … with 68 more variables: wk9 <dbl>, wk10 <dbl>, wk11 <dbl>, wk12 <dbl>,
## # wk13 <dbl>, wk14 <dbl>, wk15 <dbl>, wk16 <dbl>, wk17 <dbl>, wk18 <dbl>,
## # wk19 <dbl>, wk20 <dbl>, wk21 <dbl>, wk22 <dbl>, wk23 <dbl>, wk24 <dbl>,
## # wk25 <dbl>, wk26 <dbl>, wk27 <dbl>, wk28 <dbl>, wk29 <dbl>, wk30 <dbl>,
## # wk31 <dbl>, wk32 <dbl>, wk33 <dbl>, wk34 <dbl>, wk35 <dbl>, wk36 <dbl>,
## # wk37 <dbl>, wk38 <dbl>, wk39 <dbl>, wk40 <dbl>, wk41 <dbl>, wk42 <dbl>,
## # wk43 <dbl>, wk44 <dbl>, wk45 <dbl>, wk46 <dbl>, wk47 <dbl>, wk48 <dbl>, …
Try exercises originally developed by Prof. Alexander Fisher and ask me questions.
This is originally developed by Prof. Becky Tang and has been slightly modified. Click here to see the original homework.
Postoperative sore throat is an annoying complication of intubation after surgery, particularly with wider gauge double-lumen tubes. Reutzler et al. (2013) performed an experimental study in Germany among patients having elective surgery who required intubation with a double-lumen tube. Prior to anesthesia, patients were randomly assigned to gargle either a licorice-based solution or sugar water (as placebo).
Sore throat was evaluated 30 minutes, 90 minutes, and 4 hours after conclusion of the surgery, evaluated using a numeric scale from 0 to 10, where 0 = no pain and 10 = worst pain. For the purposes of this assignment, we will treat these pain scales as numeric.
The data are available in your assignment repository as a .csv file. Some relevant variables of interest are:
preOp_gender
: Gender (0 = Male; 1 = Female)preOp_calcBMI
: Body mass index in kg/m\(^2\)preOp_asa
: American Society of Anesthesiologists physical status classification (1 = normal healthy patient, 2 = mild systemic disease, 3 = severe systemic disease)treat
: Treatment given (0 = Sugar placebo; 1 = Licorice solution)pacu30min_throatPain
: Sore throat pain score 30 minutes after arrival in the post-anesthesia care unit (PACU)Overall hint: When performing a hypothesis test, you must provide the significance level of your test, the null and alternative hypotheses, the p-value, your decision, and an interpretation of the p-value in context of the original research question. If you are using a non-simulation-based approach, you must also provide the value of your test statistic and the distribution of that test statistic assuming the null hypothesis is true.
Overall hint: To ensure reproducibility, for all exercises requiring a simulation-based approach, set a seed of your choice. Additionally, ensure that the number of repetitions is sufficiently large.
Be careful with missing values of the variables you’re analyzing in each question!
Construct and interpret a 95% confidence interval for the mean sore throat pain score 30 minutes after arrival in the PACU among all patients using both a simulation-based approach and a CLT-based approach. Compare these two intervals.
Suppose that these patients are representative of German patients undergoing surgeries that require intubation. Is there evidence that the mean BMI among such patients differs from the mean BMI among all German adults of 26 kg/m\(^2\)? Assess this hypothesis using a simulation-based approach. Provide a visualization of your simulated null distribution and observed data (sample statistic).
Now, let’s examine any potential effects of licorice solution on reducing throat pain after surgery.
Assess whether there was a lower mean throat pain score 30 minutes after surgery among patients who received licorice compared to patients who received sugar solution placebo. Use a simulation-based approach.
Comprehensively assess whether a lower proportion of patients who received licorice solution reported having any pain 30 minutes after surgery compared to sugar solution. Use a simulation-based approach.
Based on your analyses, do you think that licorice gargle prior to surgery is effective in reducing post-intubation sore throat? Explain your answer, referencing any data, formal statistical tests, or study design as necessary.
In Exercises 6 - 10, determine whether the statements are TRUE or FALSE. If the statement is FALSE, explain why it is FALSE.
The mean BMI among patients receiving licorice solution was 25.6 kg/m\(^2\) and the mean BMI among patients receiving sugar solution placebo was 25.6 kg/m\(^2\). In assessing whether there is a difference in mean BMI between the two treatment groups using a CLT-based approach, the researchers obtained a p-value of 0.925.
If there is truly no difference in mean BMI between these two groups, then the probability of seeing a difference in BMI as large as our observed difference or even larger is approximately 0.925.
Assuming \(\alpha = 0.05\), then our p-value of 0.925 would be strong evidence that there is no difference in the mean BMI between the two treatment groups.
The probability that we have made a Type 2 error is less than 10%.
If we were to repeatedly construct 95% confidence intervals for the difference in mean BMI in the same way from the original population, then we know that 95% of those intervals would truly contain the true population difference in means.
If we instead found a p-value of 0.021, then at the \(\alpha = 0.05\) level, we would have enough evidence to conclude that there is a difference in mean BMI between the two treatment groups.