Lab #07: CLT-based Inference

due Friday, June 10 at 11:59pm

Goals

Getting started

For each exercise:

Packages

We will use the tidyverse, tidymodels, knitr packages in this lab.

library(tidyverse)
library(tidymodels)
library(knitr)

Burrito

What makes a good burrito?

Today’s dataset has been adapted from Scott Cole’s Burritos of San Diego project. The goal of the project was to identify the best and worst burritos in San Diego, characterize variance in burrito quality, and generate predictive models for what makes a burrito great.

As part of this project, 71 participants reviewed burritos from 79 different taco shops. Reviewers captured objective measures of the burrito (such as whether it contains certain ingredients) and reviewed it on a number of metrics (such as quality of the tortilla, the temperature, quality of meat, etc.). For the purposes of this lab, you may consider each of these observations to be an independent and representative sample of all burritos.

The subjective ratings in the dataset are as follows. Each variable is ranked on a 0 to 5 point scale, with 0 being the worst and 5 being the best.

In addition, the reviewers noted the presence of the following burrito components. Each of the following variables is a binary variable taking on values present or none:

The data are available in burritos.csv

The goal of this analysis is to make inference about the mean synergy rating of burritos based on the Central Limit Theorem (CLT).

  1. We’ll start by examining the distribution of synergy, a rating indicating how well all the ingredients in the burrito come together.
  1. The goal of this analysis is to use CLT-based inference to understand the true mean synergy rating of all burritos. What is your “best guess” for the mean synergy rating of burritos?

Is the synergy in burritos generally good? To answer this question, we will conduct a hypothesis test to evaluate if the mean synergy score of all burritos is greater than 3.

  1. Before conducting inference, we need to check the conditions to make sure the CLT can be applied in this analysis. For each condition, indicate whether it is satisfied and provide a brief explanation supporting your response.
  1. State the null and alternative hypotheses to evaluate the question posed in the previous exercise. Write the hypotheses in words and in statistical notation. Clearly define any parameter you introduce.

  2. Let \(\bar{X}\) be a random variable for the mean synergy score in a sample of 330 randomly selected burritos. Given the CLT and the hypotheses from the previous exercise,

$\bar{X} \sim N( , )$

In practice, we never know the true value of \(\sigma\), so we estimate it with the observed standard deviation \(s\) from our data. Consequently, the null distribution would slightly change.

  1. Recall the formula for the test statistic:

\[t = \frac{\bar{x}- \mu_{0}}{\hat{se}}\] where \(\mu_0\) is the null value of \(\mu\) (the specified value in the null hypothesis for \(\mu\)), and \(\hat{se}\) is the estimated standard error of \(\bar{X}\). \(\hat{se}\) is calculated by replacing \(\sigma\) in the standard error you found in Ex 5 with \(s\).

  1. Now let’s calculate the p-value and draw a conclusion.
  1. We also want to calculate a 90% confidence interval for the mean synergy rating of all burritos. The confidence interval for a population mean is

\[\bar{x} \pm t^*_{n-1} \times \hat{se}\]

We already know \(\bar{x}\) and \(\hat{se}\), so let’s focus on the calculating \(t^*_{n-1}\). We will use the qt() function to calculate the critical value \(t^*_{n-1}\).

Here is an example: If we want to calculate a 95% confidence interval for the mean, we will use qt(0.975, n-1), where 0.975 is the cumulative probability at the upper bound of the 95% confidence interval (recall we used this value to find the upper bound when calculating bootstrap confidence intervals), and (n-1) are the degrees of freedom.

  1. In the previous exercises, we conducted a hypothesis test and calculated a confidence interval step-by-step. We can also use the infer package for the calculations in CLT-based inference using the t_test() function.

The results should be the same as the calculations you did in the previous exercises.

burrito %>%
  t_test(response = _____, 
         alternative = "______", 
         mu = ______, 
         conf_int = FALSE)

The results should be the same as the calculations in Ex 8.

burrito %>%
  t_test(response = _____, 
         conf_int = _____, 
         conf_level = _____) %>%
  select(lower_ci, upper_ci)

Submission

Grading (50 pts)


Component Points
Ex 1 6
Ex 2 1
Ex 3 3
Ex 4 5
Ex 5 3
Ex 6 6
Ex 7 6
Ex 8 8
Ex 9 5
Workflow & formatting 7

Grading notes: