AE 12: Foundations of Inference

Getting Started

Statistical Process Terminology

Part 1: Population vs. Sample
Part 2: Scientific Studies¹

Data Generative Process

Part 3
Part 4
Part 5
Practice

Submitting Application Exercises

Getting Started

Clone the repository entitled “ae12-GitHubUsername” at course GitHub organization page on your RStudio.
Open the .Rmd file and replace “Your Name” with your name.

Statistical Process Terminology

Part 1: Population vs. Sample

Go to the Monmouth University Polling Institute website and select a poll of interest. Briefly read the poll results and methodology section at the end. Try and identify the following for a different example:

Title: Steady Support for Russia Sanctions
Population of interest: Americans with age 18 and older
Parameter of interest: Proportion supporting the economic sanctions imposed on Russia in response to its invasion of Ukraine
Sample: A nationwide random sample of adults age 18 and older
Sample size: 807
Sample statistic: Proportion supporting the economic sanctions imposed on Russia in response to its invasion of Ukraine
Sample statistic’s value: 77%

Part 2: Scientific Studies¹

Researchers aim to evaluate the effects of prenatal exposure to air pollutants on neurodevelopmental and behavioral development in children. They recruited a total of 700 African American and Dominican women from two prenatal clinics in Northern Manhattan who satisfied the following inclusion criteria:

enrolled in a longitudinal birth cohort as a mother-child dyad
aged 18-35
had first prenatal visit before the 20th week of gestation
were free of diabetes, hypertension, or known HIV
did not report tobacco or illicit drug use
resided in the study area for at least one year prior to pregnancy

For neurodevelopmental and behavioral development in children, an IQ test was administered to children at age seven.

Q - What type of study is it? What kind of conclusions can we make?

Q - Identify a response variable, explanatory variables, and confounding variables.

Q - Can we extend findings in the study to a broader population, for instance, mother-child dyads across the US?

Q - What could have been done differently for a more representative sample?

Data Generative Process

Your friend wants to play a game. The game goes: you flip a coin. If it's tails then the turn goes to your friend. If it's heads, you keep playing.

We will say a Tails is 0 and a Heads is 1. To be more precise, we will say:

Let be a random variable that maps the outcome (“Heads”, “Tails”) to the numbers (1, 0) respectively.

A random variable takes each outcome (possibly a categorical variable) in the sample space and maps it to a number.

Let’s flip a real coin with your friend for 10 times and see how this game goes.

Part 3

Input the data.

# coin_flips <- c()

Q - Do you think this coin is fair? Why?

Let’s come up with a data generative process, a model for how the data were generated.

We might imagine that (the outcome of an individual coin flip, represented as 0 or 1) is obeying some universal law. Each time I flip the coin, there is a probability that the coin lands heads. What is the probability the coin lands tails?

This probability is fixed to this coin. For example, some scenarios:

the coin is fair,
the coin is double-sided, or
the coin is weighted slightly in favor tails:

We say “ is Bernoulli distributed with parameter .”

this means and .

Now we can frame questions about the coin being fair in terms of .

Part 4

Let’s flip a coin for another 10 times.

# coin_flips <- c(coin_flips, )

What's the probability of this outcome if the coin is fair?

In order to answer this question, we need a distribution of the number of heads from 20 trials with a fair coin.

We can define to be the number of heads. In other words, where is the total number of coin flips. In our case, .

We say “ follows a Bionomial distribution with parameters and ”.

this means:

Notice:

the number of heads observed is lower case
is the number of tails

Part 5

Once we figure out the distribution, we can either calculate the probability analytically or through simulations. For analytical results, we can simply replace with the number of heads that we want to compute the probability for.

Let’s practice how to simulate coin flips. The code below simulates 100,000 coin flips where each flip is Bernoulli distributed with . Note, Bernoulli(p) Binomial(1, p).

set.seed(527)
flips <- rbinom(100000, 1, 0.5)

We can simulate 20 coin flips for multiple times as well. The code below simulates the number of heads out of 20 coin flips for 50,000 times. Each element follows a Binomial distribution with and .

set.seed(527)
total_head <- rbinom(50000, 20, 0.5)

Q - What does rbinom() do?

Practice

Using total_head, find the probability of seeing 15 tails out of 20 coin flips.
Compare the above results based on simulation with analytical results using dbinom().
Plot a histogram of the simulation. What does this show?

library(tidyverse)

What is the probability of seeing at least 15 tails?
Verify the results in (4) with analytical results using pbinom().
Using total_head, find the probability of seeing your result out of 20 coin flips.

Submitting Application Exercises

Once you have completed the activity, push your final changes to your GitHub repo.
Make sure you committed at least three times.
Check that your repo is updated on GitHub, and that’s all you need to do to submit application exercises for participation.

Modified from Source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1241351/↩︎

AE 12: Foundations of Inference

due Tuesday, May 31 at 9:29am

Bora Jin

Getting Started

Statistical Process Terminology

Part 1: Population vs. Sample

Part 2: Scientific Studies¹

Data Generative Process

Part 3

Part 4

Part 5

Practice

Submitting Application Exercises

AE 12: Foundations of Inference

due Tuesday, May 31 at 9:29am

Bora Jin

Getting Started

Statistical Process Terminology

Part 1: Population vs. Sample

Part 2: Scientific Studies1

Data Generative Process

Part 3

Part 4

Part 5

Practice

Submitting Application Exercises

Part 2: Scientific Studies¹