class: center, middle, inverse, title-slide # Confidence Intervals through Bootstrapping ### Bora Jin --- layout: true <div class="my-footer"> <span> <a href="https://introds.org" target="_blank">introds.org</a> </span> </div> --- ## Material 🎥 Watch [Bootstrapping](https://youtu.be/bdqpI3iVOso) - [Slides](https://rstudio-education.github.io/datascience-box/course-materials/slides/u4-d11-bootstrap/u4-d11-bootstrap.html#1) Optional: 📖 Read [IMS: Chapter 12 - Confidence Intervals With Bootstrapping](https://openintro-ims.netlify.app/foundations-bootstrapping.html) --- ## Today's Goal - Understand how to draw a bootstrap sample and calculate a bootstrap statistic - Use `infer` (part of `tidymodels`) to obtain a bootstrap distribution - Calculate a confidence interval from the bootstrap distribution - Interpret a confidence interval in context of the data --- ## Ideas Behind Bootstrapping -- - It is extremely hard to hit the exact population parameter with a single value (a.k.a. point estimate or sample statistic) from a sample. -- - We would have a better chance with **a range of plausible values** for the population parameter. -- - One statistic from one sample... then we need multiple samples to obtain multiple statistics. -- - Uh oh, getting multiple samples is generally expensive and time-consuming. If it was so easy, we would simply collect data from every unit in the population. -- - We just have to make the best of it! With the sample at hand, let's pretend that we have multiple samples!! `\(\Rightarrow\)` Bootstrapping is a resampling technique. --- ## Quiz **Q - Correct the following sentences.** 🆘 A bootstrap sample is taken without replacement from the original sample. -- ✅ A bootstrap sample is taken with replacement from the original sample. -- 🆘 A researcher chooses a bootstrap sample size. -- ✅ A bootstrap sample should be of the same size as the original sample. --- ## Quiz **Q - Correct the following sentences.** 🚨 A bootstrap distribution is a distribution of resampled data. -- ✅ A bootstrap distribution is a distribution of bootstrap statistics. -- 🚨 Calculate the bounds of the XX% confidence interval as the shortest interval with XX% of the bootstrap distribution. -- ✅ Calculate the bounds of the XX% confidence interval as the middle XX% of the bootstrap distribution. --- ## Quiz **Q - Correct the following sentences.** 🚨 Bootstrapping can be used for mean and median, but not for standard deviation. -- ✅ Bootstrapping can be used for any sample statistic. -- 🆘 The length of a confidence interval has nothing to do with its confidence level. -- ✅ The lower the confidence level, the narrower the interval. --- ## Quiz **Q - Correct the following sentences.** 🆘 Bootstrapping can improve statistical inference based on a bad sample. -- ✅ Bootstrapping is meaningless if the original sample is not representative because your bootstrap distribution will be centered around the original sample statistic. - Bootstrapping is best suited for modeling studies where the data have been generated through **random sampling** from a population. --- ## Quiz The 95% confidence interval for the mean price per guest per night among Airbnb rentals (with at least ten reviews) in Asheville was ($64, $90). **Q - What is the correct interpretation for this interval?** a. There is a 95% probability the mean price per night for an Airbnb in Asheville is between $64 and $90. b. There is a 95% probability the price per night for an Airbnb in Asheville is between $64 and $90. c. We are 95% confident the mean price per night for Airbnbs in Asheville in our sample is between $64 and $90. d. We are 95% confident the mean price per night for all Airbnbs in Asheville is between $64 and $90. --- ## Quiz The 95% confidence interval for the mean price per guest per night among Airbnb rentals (with at least ten reviews) in Asheville was ($64, $90). **Q - What is the correct interpretation for this interval?** a. There is a 95% probability the mean price per night for an Airbnb in Asheville is between $64 and $90. b. There is a 95% probability the price per night for an Airbnb in Asheville is between $64 and $90. c. We are 95% confident the mean price per night for Airbnbs in Asheville in our sample is between $64 and $90. **d. We are 95% confident the mean price per night for all Airbnbs in Asheville is between $64 and $90.** --- ## Quiz a. "There is a 95% probability" - The true unknown parameter is either in ($64, $90) or not. It can't have a "95% probability" of being in any specific interval. b. "the price per night for an Airbnb in Asheville" - We did not infer on individual price; we did infer on the population mean price per night. c. "in our sample" - I don't need to build a confidence interval for the sample statistic. It's known! --- ## Quiz **Q - What does it mean that "we are 95% confident"?** -- - Suppose we took 100 independent samples from a population. - For each sample, we computed the sample statistic and constructed a 95% confidence interval using bootstrapping. - Then we would expect 95 out of these 100 intervals to contain the unknown parameter. --- class: middle, center # Questions? --- ## Let's Practice Together! Go to [AE 14: Confidence Intervals through Bootstrapping](https://sta199-summer22.netlify.app/appex/ae14_BJ.html) --- ## Bulletin - Watch videos for [Prepare: June 2](https://sta199-summer22.netlify.app/prepare/week04_jun02_BJ.html) - Project proposal due Friday, June 3 at 11:59pm - HW03 released - Submit `ae14`