AE 17: Central Limit Theorem

Getting Started

Clone the repository entitled “ae17-GitHubUsername” at course GitHub organization page on your RStudio.
Open the .Rmd file and replace “Your Name” with your name.

Bone Density for Women Over 65

library(tidyverse)
theme_set(theme_bw())

Suppose the bone density for 65-year-old women is normally distributed with mean \(809 ~mg/cm^3\) and standard deviation of \(140 ~mg/cm^3\).

Let \(X\) be the bone density of 65-year-old women. We can write this distribution of \(X\) in mathematical notation as

\[X \sim N(809, 140)\]

Part 1: Population distribution

The population distribution is given below:

ggplot(data = data.frame(x = c(809 - 140*3, 809 + 140*3)), 
       aes(x)) + 
  stat_function(fun = dnorm, args = list(mean = 809, sd = 140)) + 
  labs(title = "Population distribution", x = "", y = "Density")

Q - Before typing any code, based on what you know about the normal distribution, what do you expect the median bone density to be?

Q - What bone densities correspond to Q1 (25th percentile), Q2 (50th percentile), and Q3 (the 75th percentile) of this distribution? Use the qnorm() function to calculate these values.

The densities of three woods are below:

Plywood: \(540 ~mg/cm^3\)
Pine: \(600 ~mg/cm^3\)
Mahogany: \(710 ~mg/cm^3\)

Q - What is the probability that a randomly selected 65-year-old woman has bones less dense than Pine?

Q - Would you be surprised if a randomly selected 65-year-old woman had bone density less than Mahogany? What if she had bone density less than Plywood? Use the respective probabilities to support your response.

Part 2: Sample mean and its distribution

Suppose you want to analyze the mean bone density for a group of 10 randomly selected 65-year-old women.

Q - What is the center and spread of the distribution of \(\bar{X}\), the mean bone density for a group of 10 randomly selected 65-year-old women?

We want to know the shape of the distribution of \(\bar{X}\), not just mean and standard deviation.

Q - Before moving forward, let’s check the conditions required to apply the Central Limit Theorem (CLT) to define the distribution of \(\bar{X}\).

Independence?
Sample size / distribution?

Q - Apply CLT and write the distribution of \(\bar{X}\) using mathematical notation. Describe its distribution in terms of the shape, center, and spread. Is it an approximate distribution or an exact distribution?

The distribution of sample means is given below in comparison with the population distribution:

ggplot(data = data.frame(x = c(809 - 140*3, 809 + 140*3)), 
       aes(x)) + 
  stat_function(fun = dnorm, args = list(mean = 809, sd = 140), 
            aes(color = "population", linetype = "population")) +
  stat_function(fun = dnorm, args = list(mean = 809, sd = 140/sqrt(10)), 
            aes(color = "sample mean", linetype = "sample mean")) +
  labs(x = "", y = "Density", linetype = "", color = "") +
  scale_color_manual(values = c("population" = "black", 
                                "sample mean" = "red")) +
  scale_linetype_manual(values = c("population" = "solid", 
                                   "sample mean" = "dashed")) +
  geom_vline(xintercept = 809, color = "gray", size = 1) +
  theme(legend.position = c(0.8, 0.8))

Q - What is the probability that the mean bone density for a group of 10 randomly-selected 65-year-old women is less dense than Pine?

Q - Would you be surprised if a group of 10 randomly-selected 65-year old women had a mean bone density less than Mahogany? What the group had a mean bone density less than Plywood? Use the respective probabilities to support your response.

Q - Explain how and why your answers differ in Part 1 and Part 2.

Practice

Let \(X\) follow a Gamma distribution with the shape parameter \(\alpha = 1\) and the rate parameter \(\beta = 0.1\). Then its distribution looks like this:

ggplot(data = data.frame(x = c(0,30)), aes(x)) +
  stat_function(fun = dgamma, args = list(shape = 1, rate = .1), 
                color = "black") +
  labs(x = "", y = "Density")

This is far from any normal distributions. Theoretically, its mean is \(\alpha/\beta = 10\) and standard deviation is \(\sqrt{\alpha/\beta^2} = 10\).

Suppose you take 40 random observations of \(X\), take the mean, and save it as \(\bar{X}\).

Q - Can we apply the CLT? Does this scenario satisfy the following conditions?

Independence?
Sample size / distribution?

Q - If you said yes from the previous question, write the distribution of \(\bar{X}\) by CLT using mathematical notation.

Q - Visualize the distribution of the sample mean by completing the code chunk below:

ggplot(data = data.frame(x = c(0,30)), aes(x)) +
  stat_function(fun = dgamma, args = list(shape = 1, rate = 0.1), 
                color = "black", linetype = "solid") +
  stat_function(fun = dnorm, args = list(mean = __, sd = __), 
                color = "red", linetype = "dashed") +
  labs(x = "", y = "Density")

Optional

Let \(X\) follow a Chi-squared distribution with degrees of freedom 9, \(X \sim \chi^2(9)\). Suppose you take 200 random observations of \(X\), take the mean, and save it as \(\bar{X}\). Find the distribution of \(\bar{X}\) and visualize distributions of \(X\) and \(\bar{X}\). Hint: Theoretically, a random variable \(Y\) that follows \(\chi^2(d)\) has mean of \(d\) and standard deviation of \(\sqrt{2d}\).

ggplot(data = data.frame(x = c(0, 15)), aes(x)) +
  stat_function(fun = ____, args = list(df = __), # original Chisq distribution
                color = "black", linetype = "solid") +
  stat_function(fun = ____, args = list(mean = ___, sd = ___), # dist of xbar
                color = "red", linetype = "dashed") +
  labs(x = "", y = "Density")

Submitting Application Exercises

Once you have completed the activity, push your final changes to your GitHub repo.
Make sure you committed at least three times.
Check that your repo is updated on GitHub, and that’s all you need to do to submit application exercises for participation.