+ - 0:00:00
Notes for current slide
Notes for next slide

Confidence Intervals through Bootstrapping

Bora Jin

1 / 15

Today's Goal

  • Understand how to draw a bootstrap sample and calculate a bootstrap statistic
  • Use infer (part of tidymodels) to obtain a bootstrap distribution
  • Calculate a confidence interval from the bootstrap distribution
  • Interpret a confidence interval in context of the data
3 / 15

Ideas Behind Bootstrapping

4 / 15

Ideas Behind Bootstrapping

  • It is extremely hard to hit the exact population parameter with a single value (a.k.a. point estimate or sample statistic) from a sample.
4 / 15

Ideas Behind Bootstrapping

  • It is extremely hard to hit the exact population parameter with a single value (a.k.a. point estimate or sample statistic) from a sample.

  • We would have a better chance with a range of plausible values for the population parameter.

4 / 15

Ideas Behind Bootstrapping

  • It is extremely hard to hit the exact population parameter with a single value (a.k.a. point estimate or sample statistic) from a sample.

  • We would have a better chance with a range of plausible values for the population parameter.

  • One statistic from one sample... then we need multiple samples to obtain multiple statistics.

4 / 15

Ideas Behind Bootstrapping

  • It is extremely hard to hit the exact population parameter with a single value (a.k.a. point estimate or sample statistic) from a sample.

  • We would have a better chance with a range of plausible values for the population parameter.

  • One statistic from one sample... then we need multiple samples to obtain multiple statistics.

  • Uh oh, getting multiple samples is generally expensive and time-consuming. If it was so easy, we would simply collect data from every unit in the population.

4 / 15

Ideas Behind Bootstrapping

  • It is extremely hard to hit the exact population parameter with a single value (a.k.a. point estimate or sample statistic) from a sample.

  • We would have a better chance with a range of plausible values for the population parameter.

  • One statistic from one sample... then we need multiple samples to obtain multiple statistics.

  • Uh oh, getting multiple samples is generally expensive and time-consuming. If it was so easy, we would simply collect data from every unit in the population.

  • We just have to make the best of it! With the sample at hand, let's pretend that we have multiple samples!! Bootstrapping is a resampling technique.

4 / 15

Quiz

Q - Correct the following sentences.

🆘 A bootstrap sample is taken without replacement from the original sample.

5 / 15

Quiz

Q - Correct the following sentences.

🆘 A bootstrap sample is taken without replacement from the original sample.

✅ A bootstrap sample is taken with replacement from the original sample.

5 / 15

Quiz

Q - Correct the following sentences.

🆘 A bootstrap sample is taken without replacement from the original sample.

✅ A bootstrap sample is taken with replacement from the original sample.

🆘 A researcher chooses a bootstrap sample size.

5 / 15

Quiz

Q - Correct the following sentences.

🆘 A bootstrap sample is taken without replacement from the original sample.

✅ A bootstrap sample is taken with replacement from the original sample.

🆘 A researcher chooses a bootstrap sample size.

✅ A bootstrap sample should be of the same size as the original sample.

5 / 15

Quiz

Q - Correct the following sentences.

🚨 A bootstrap distribution is a distribution of resampled data.

6 / 15

Quiz

Q - Correct the following sentences.

🚨 A bootstrap distribution is a distribution of resampled data.

✅ A bootstrap distribution is a distribution of bootstrap statistics.

6 / 15

Quiz

Q - Correct the following sentences.

🚨 A bootstrap distribution is a distribution of resampled data.

✅ A bootstrap distribution is a distribution of bootstrap statistics.

🚨 Calculate the bounds of the XX% confidence interval as the shortest interval with XX% of the bootstrap distribution.

6 / 15

Quiz

Q - Correct the following sentences.

🚨 A bootstrap distribution is a distribution of resampled data.

✅ A bootstrap distribution is a distribution of bootstrap statistics.

🚨 Calculate the bounds of the XX% confidence interval as the shortest interval with XX% of the bootstrap distribution.

✅ Calculate the bounds of the XX% confidence interval as the middle XX% of the bootstrap distribution.

6 / 15

Quiz

Q - Correct the following sentences.

🚨 Bootstrapping can be used for mean and median, but not for standard deviation.

7 / 15

Quiz

Q - Correct the following sentences.

🚨 Bootstrapping can be used for mean and median, but not for standard deviation.

✅ Bootstrapping can be used for any sample statistic.

7 / 15

Quiz

Q - Correct the following sentences.

🚨 Bootstrapping can be used for mean and median, but not for standard deviation.

✅ Bootstrapping can be used for any sample statistic.

🆘 The length of a confidence interval has nothing to do with its confidence level.

7 / 15

Quiz

Q - Correct the following sentences.

🚨 Bootstrapping can be used for mean and median, but not for standard deviation.

✅ Bootstrapping can be used for any sample statistic.

🆘 The length of a confidence interval has nothing to do with its confidence level.

✅ The lower the confidence level, the narrower the interval.

7 / 15

Quiz

Q - Correct the following sentences.

🆘 Bootstrapping can improve statistical inference based on a bad sample.

8 / 15

Quiz

Q - Correct the following sentences.

🆘 Bootstrapping can improve statistical inference based on a bad sample.

✅ Bootstrapping is meaningless if the original sample is not representative because your bootstrap distribution will be centered around the original sample statistic.

  • Bootstrapping is best suited for modeling studies where the data have been generated through random sampling from a population.
8 / 15

Quiz

The 95% confidence interval for the mean price per guest per night among Airbnb rentals (with at least ten reviews) in Asheville was ($64, $90).

Q - What is the correct interpretation for this interval?

a. There is a 95% probability the mean price per night for an Airbnb in Asheville is between $64 and $90.

b. There is a 95% probability the price per night for an Airbnb in Asheville is between $64 and $90.

c. We are 95% confident the mean price per night for Airbnbs in Asheville in our sample is between $64 and $90.

d. We are 95% confident the mean price per night for all Airbnbs in Asheville is between $64 and $90.

9 / 15

Quiz

The 95% confidence interval for the mean price per guest per night among Airbnb rentals (with at least ten reviews) in Asheville was ($64, $90).

Q - What is the correct interpretation for this interval?

a. There is a 95% probability the mean price per night for an Airbnb in Asheville is between $64 and $90.

b. There is a 95% probability the price per night for an Airbnb in Asheville is between $64 and $90.

c. We are 95% confident the mean price per night for Airbnbs in Asheville in our sample is between $64 and $90.

d. We are 95% confident the mean price per night for all Airbnbs in Asheville is between $64 and $90.

10 / 15

Quiz

a. "There is a 95% probability"

  • The true unknown parameter is either in ($64, $90) or not. It can't have a "95% probability" of being in any specific interval.

b. "the price per night for an Airbnb in Asheville"

  • We did not infer on individual price; we did infer on the population mean price per night.

c. "in our sample"

  • I don't need to build a confidence interval for the sample statistic. It's known!
11 / 15

Quiz

Q - What does it mean that "we are 95% confident"?

12 / 15

Quiz

Q - What does it mean that "we are 95% confident"?

  • Suppose we took 100 independent samples from a population.
  • For each sample, we computed the sample statistic and constructed a 95% confidence interval using bootstrapping.
  • Then we would expect 95 out of these 100 intervals to contain the unknown parameter.
12 / 15

Questions?

13 / 15

Let's Practice Together!

Go to AE 14: Confidence Intervals through Bootstrapping

14 / 15

Bulletin

  • Watch videos for Prepare: June 2

  • Project proposal due Friday, June 3 at 11:59pm

  • HW03 released

  • Submit ae14

15 / 15
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow