class: center, middle, inverse, title-slide # Central Limit Theorem 1 ### Bora Jin --- layout: true <div class="my-footer"> <span> <a href="https://introds.org" target="_blank">introds.org</a> </span> </div> --- ## Material 🎥 Watch [Central Limit Theorem](https://warpwire.duke.edu/w/Vw4GAA/) - [Slides](https://sta199-fa21-003.netlify.app/slides/clt-intro.html#1) Optional: 📖 Read [IMS: Chapter 13 - Inference With Mathematical Models](https://openintro-ims.netlify.app/foundations-mathematical.html?q=central%20limit%20th#CLTsection) --- ## Today's Goal - Use Central Limit Theorem to define distribution of sample means - Calculate probabilities from the normal distribution --- ## Quantifying Variability We can quantify the variability of sample statistics using different approaches: - **Simulation**: via bootstrapping or "resampling" techniques or - **Theory**: via the Central Limit Theorem -- Today we will focus on **Theory**. --- ## Quiz **Q - What is a sampling distribution of the sample mean?** -- From one random sample of size `\(n\)`, calculate the sample mean `\(\bar{X}_1\)` -- From a second random sample of size `\(n\)`, calculate the sample mean `\(\bar{X}_2\)` -- `\(\vdots\)` Repeat this many times. -- We call the distribution of `\(\bar{X}\)` the **sampling distribution**. --- ## Quiz **Q - Apply the central limit theorem (CLT) on sample means.** Let a random variable `\(X\)` have a mean `\(\mu\)` and standard deviation `\(\sigma\)`. Then the sampling distribution of the sample mean `\(\bar{X} = \frac{X_1 + X_2 + \cdots + X_n}{n}\)` -- - Has the mean `\(\mu\)` -- - Has the standard error `\(\sigma/\sqrt{n}\)` .small[ standard error of a sample mean = standard deviation of its sampling distribution or an estimate of that standard deviation ] -- - **If the sample size** `\(n\)` **is large enough**, the sampling distribution of `\(\bar{X}\)` is **approximately** normally distributed. .question[ As `\(n \rightarrow \infty\)`, `\(\bar{X}\)` converges in distribution to `\(N(\mu, \sigma/\sqrt{n})\)`. ] --- ## Quiz **Q - Describe density of a normal distribution.** -- .pull-left[ - unimodal (peak at `\(\mu\)`) - symmetric around `\(\mu\)` - bell-shaped ] .pull-right[ <img src="17-clt_BJ_files/figure-html/unnamed-chunk-2-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Quiz **Q - The CLT holds only if** `\(X \sim N(\mu, \sigma)\)`. **(T/F)** -- F -- `\(X\)` can be from **any** distribution with a mean `\(\mu\)` and standard deviation `\(\sigma\)`. Let's play with weird looking original distributions - [Click!](https://onlinestatbook.com/stat_sim/sampling_dist/index.html) --- ## Quiz **Q - What are the two conditions for CLT to hold?** -- Independence - `\(\{X_1,\cdots,X_n\}\)` must be independent to one another - One observation's value should not "influence" another observation's value. - Rules of thumb to check independence: - Completely random sampling - If taken without replacement, `\(n\)` should be less than 10% of the population size --- ## Quiz **Q - What are the two conditions for CLT to hold?** -- Sample size `\(n\)` / distribution - If `\(X\)` is numerical, `\(n > 30\)` - If `\(X\)` is categorical, at least 10 successes and 10 failures - If `\(X \sim N(\mu, \sigma)\)`, then the distribution of sample means will also be **exactly** normal, regardless of the sample size. --- ## Recap - If certain assumptions are satisfied, regardless of the shape of the population distribution, the sampling distribution of the mean follows an approximately normal distribution. -- - The center of the sampling distribution is at the center of the population distribution. -- - The sampling distribution is less variable than the population distribution by a factor of `\(1/\sqrt{n}\)`. -- - As `\(n\)` increases, the standard error (the spread of the sampling distribution) decreases. --- ## Quiz **Q - What is an appropriate code to calculate** `\(P(Z < 1.2)\)` **where** `\(Z \sim N(0,1)\)`**?** <img src="17-clt_BJ_files/figure-html/unnamed-chunk-3-1.png" width="50%" style="display: block; margin: auto;" /> -- ```r pnorm(1.2) ``` ``` ## [1] 0.8849303 ``` --- ## Quiz **Q - What is an appropriate code to calculate** `\(P(-2 < X < 7)\)` **where** `\(X \sim N(1,3)\)`**?** <img src="17-clt_BJ_files/figure-html/unnamed-chunk-5-1.png" width="50%" style="display: block; margin: auto;" /> -- ```r pnorm(7, mean = 1, sd = 3) - pnorm(-2, mean = 1, sd = 3) ``` ``` ## [1] 0.8185946 ``` --- ## Quiz **Q - What is an appropriate code to find** q **s.t.** `\(P(X > q) = 0.05\)` **where** `\(X \sim N(-1,2)\)`**?** <img src="17-clt_BJ_files/figure-html/unnamed-chunk-7-1.png" width="50%" style="display: block; margin: auto;" /> -- ```r qnorm(0.05, mean = -1, sd = 2, lower.tail = FALSE) ``` ``` ## [1] 2.289707 ``` --- class: middle, center # Questions? --- ## Let's Practice Together! Go to [AE 17: Central Limit Theorem](https://sta199-summer22.netlify.app/appex/ae17_BJ.html) --- ## Bulletin - Watch videos for [Prepare: June 7](https://sta199-summer22.netlify.app/prepare/week05_jun07_BJ.html) - Project proposal feedback released - Lab06 due Tuesday, June 7 at 11:59pm - HW03 due Wednesday, June 8 at 11:59pm - Submit `ae17`