lab07.Rmd
to open the template R Markdown file.For each exercise:
Show all relevant code and output used to obtain your response. If you use inline code, make sure we can still see the code used to derive that answer.
Write all narrative in complete sentences, and use clear axis labels and titles on visualizations.
Use a small number of reps
(about 500) as you write and test out your code. Once you have finalized all of your code, increase the number of reps
to 15,000 to produce your final results.
For each simulation exercise, use the seed specified in the exercise instructions.
We will use the tidyverse, tidymodels, knitr packages in this lab.
library(tidyverse)
library(tidymodels)
library(knitr)
What makes a good burrito?
Today’s dataset has been adapted from Scott Cole’s Burritos of San Diego project. The goal of the project was to identify the best and worst burritos in San Diego, characterize variance in burrito quality, and generate predictive models for what makes a burrito great.
As part of this project, 71 participants reviewed burritos from 79 different taco shops. Reviewers captured objective measures of the burrito (such as whether it contains certain ingredients) and reviewed it on a number of metrics (such as quality of the tortilla, the temperature, quality of meat, etc.). For the purposes of this lab, you may consider each of these observations to be an independent and representative sample of all burritos.
The subjective ratings in the dataset are as follows. Each variable is ranked on a 0 to 5 point scale, with 0 being the worst and 5 being the best.
tortilla
: quality of the tortillatemp
: temperature of the burritomeat
: quality of the meatfillings
: quality of non-meat fillingssalsa
: quality of the salsamfr
: meat-to-filling ratiouniformity
: whether each bite contains a uniform slew of ingredients (e.g., a bite entirely composed of tortilla and sour cream would probably be terrible)synergy
: how well it all comes togetherIn addition, the reviewers noted the presence of the following burrito components. Each of the following variables is a binary variable taking on values present
or none
:
guac
: guacamolecheese
: cheesefries
: fries (it’s a thing, look it up.)sourcream
: sour creamrice
: ricebeans
: beansThe data are available in burritos.csv
The goal of this analysis is to make inference about the mean synergy rating of burritos based on the Central Limit Theorem (CLT).
synergy
, a rating indicating how well all the ingredients in the burrito come together.Visualize the distribution of synergy
using a histogram with binwidth of 0.5.
Does the distribution look “normal”? Comment on the shape of the distribution.
Calculate the following summary statistics: the mean synergy, standard deviation of synergy, and sample size. Save the summary statistics as summary_stats
. Then display summary_stat
with kable()
.
Is the synergy in burritos generally good? To answer this question, we will conduct a hypothesis test to evaluate if the mean synergy score of all burritos is greater than 3.
State the null and alternative hypotheses to evaluate the question posed in the previous exercise. Write the hypotheses in words and in statistical notation. Clearly define any parameter you introduce.
Let \(\bar{X}\) be a random variable for the mean synergy score in a sample of 330 randomly selected burritos. Given the CLT and the hypotheses from the previous exercise,
$\bar{X} \sim N( , )$
In practice, we never know the true value of \(\sigma\), so we estimate it with the observed standard deviation \(s\) from our data. Consequently, the null distribution would slightly change.
\[t = \frac{\bar{x}- \mu_{0}}{\hat{se}}\] where \(\mu_0\) is the null value of \(\mu\) (the specified value in the null hypothesis for \(\mu\)), and \(\hat{se}\) is the estimated standard error of \(\bar{X}\). \(\hat{se}\) is calculated by replacing \(\sigma\) in the standard error you found in Ex 5 with \(s\).
pt()
function to calculate the p-value.\[\bar{x} \pm t^*_{n-1} \times \hat{se}\]
We already know \(\bar{x}\) and \(\hat{se}\), so let’s focus on the calculating \(t^*_{n-1}\). We will use the qt()
function to calculate the critical value \(t^*_{n-1}\).
Here is an example: If we want to calculate a 95% confidence interval for the mean, we will use qt(0.975, n-1)
, where 0.975 is the cumulative probability at the upper bound of the 95% confidence interval (recall we used this value to find the upper bound when calculating bootstrap confidence intervals), and (n-1) are the degrees of freedom.
infer
package for the calculations in CLT-based inference using the t_test()
function.The results should be the same as the calculations you did in the previous exercises.
%>%
burrito t_test(response = _____,
alternative = "______",
mu = ______,
conf_int = FALSE)
The results should be the same as the calculations in Ex 8.
%>%
burrito t_test(response = _____,
conf_int = _____,
conf_level = _____) %>%
select(lower_ci, upper_ci)
Component | Points |
---|---|
Ex 1 | 6 |
Ex 2 | 1 |
Ex 3 | 3 |
Ex 4 | 5 |
Ex 5 | 3 |
Ex 6 | 6 |
Ex 7 | 6 |
Ex 8 | 8 |
Ex 9 | 5 |
Workflow & formatting | 7 |
Grading notes: