Clone the repository entitled “ae12-GitHubUsername” at course GitHub organization page on your RStudio.
Open the .Rmd
file and replace “Your Name” with your name.
Go to the Monmouth University Polling Institute website and select a poll of interest. Briefly read the poll results and methodology section at the end. Try and identify the following for a different example:
Population of interest: Americans with age 18 and older
Parameter of interest: Proportion supporting the economic sanctions imposed on Russia in response to its invasion of Ukraine
Sample: A nationwide random sample of adults age 18 and older
Sample size: 807
Sample statistic: Proportion supporting the economic sanctions imposed on Russia in response to its invasion of Ukraine
Sample statistic’s value: 77%
Researchers aim to evaluate the effects of prenatal exposure to air pollutants on neurodevelopmental and behavioral development in children. They recruited a total of 700 African American and Dominican women from two prenatal clinics in Northern Manhattan who satisfied the following inclusion criteria:
For neurodevelopmental and behavioral development in children, an IQ test was administered to children at age seven.
Q - What type of study is it? What kind of conclusions can we make?
Q - Identify a response variable, explanatory variables, and confounding variables.
Q - Can we extend findings in the study to a broader population, for instance, mother-child dyads across the US?
Q - What could have been done differently for a more representative sample?
Your friend wants to play a game. The game goes: you flip a coin. If it's tails then the turn goes to your friend. If it's heads, you keep playing.
We will say a Tails is 0 and a Heads is 1. To be more precise, we will say:
Let \(X\) be a random variable that maps the outcome (“Heads”, “Tails”) to the numbers (1, 0) respectively.
A random variable takes each outcome (possibly a categorical variable) in the sample space and maps it to a number.
Let’s flip a real coin with your friend for 10 times and see how this game goes.
Input the data.
# coin_flips <- c()
Q - Do you think this coin is fair? Why?
Let’s come up with a data generative process, a model for how the data were generated.
We might imagine that \(X\) (the outcome of an individual coin flip, represented as 0 or 1) is obeying some universal law. Each time I flip the coin, there is a probability \(p\) that the coin lands heads. What is the probability the coin lands tails?
This probability \(p\) is fixed to this coin. For example, some scenarios:
We say “\(X\) is Bernoulli distributed with parameter \(p\).”
\[ X \sim \text{Bernoulli}(p) \]
this means \(P(X = 1) = p\) and \(P(X = 0) = 1-p\).
Now we can frame questions about the coin being fair in terms of \(p\).
Let’s flip a coin for another 10 times.
# coin_flips <- c(coin_flips, )
What's the probability of this outcome if the coin is fair?
In order to answer this question, we need a distribution of the number of heads from 20 trials with a fair coin.
We can define \(Y\) to be the number of heads. In other words, \(Y = \sum_{i=1}^{n} X_i = X_1 + X_2 + \cdots + X_{n}\) where \(n\) is the total number of coin flips. In our case, \(n=20\).
We say “\(Y\) follows a Bionomial distribution with parameters \(n\) and \(p\)”.
\[ Y \sim \text{Binomial}(n, p) \]
this means:
\[ P(Y = y) = {n \choose y} p^{y} (1-p)^{n-y} \] Notice:
Once we figure out the distribution, we can either calculate the probability analytically or through simulations. For analytical results, we can simply replace \(y\) with the number of heads that we want to compute the probability for.
Let’s practice how to simulate coin flips. The code below simulates 100,000 coin flips where each flip is Bernoulli distributed with \(p = 0.5\). Note, Bernoulli(p) \(\stackrel{d}{=}\) Binomial(1, p).
set.seed(527)
flips <- rbinom(100000, 1, 0.5)
We can simulate 20 coin flips for multiple times as well. The code below simulates the number of heads out of 20 coin flips for 50,000 times. Each element \(Y\) follows a Binomial distribution with \(n = 20\) and \(p = 0.5\).
set.seed(527)
total_head <- rbinom(50000, 20, 0.5)
Q - What does rbinom()
do?
Using total_head
, find the probability of seeing 15 tails out of 20 coin flips.
Compare the above results based on simulation with analytical results using dbinom()
.
Plot a histogram of the simulation. What does this show?
library(tidyverse)
What is the probability of seeing at least 15 tails?
Verify the results in (4) with analytical results using pbinom()
.
Using total_head
, find the probability of seeing your result out of 20 coin flips.
Modified from Source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1241351/↩︎