Homework #03: Probability and Simulation-based Inference

due Wednesday, June 8 at 11:59pm

Goals

Getting started

Instructions

For each exercise:

Formatting

Packages

We’ll use the tidyverse package for much of the data wrangling and visualization, the tidymodels package for inference, and the data live in the openintro package.

library(tidyverse)
theme_set(theme_bw())
library(tidymodels)
library(openintro)

United States Births

Every year, the US releases a large dataset containing information on births recorded in the country. This dataset is useful to researchers studying the relation between habits and practices of expectant mothers and the birth of their children. We will work with a random sample of 1,000 observations from the dataset released in 2014.

The subsetted data can be found in the openintro package, and it’s called births14. Each observation represents a birth in the US. You can find out more about the dataset by running ?births14 in the Console.

Probability

  1. The table below tells us that each birth is classified as preterm if the gestational age is below 37 weeks. What is the probability a randomly selected baby in the US is premature?
premie min_week max_week
full term 37 46
premie 21 36
  1. Let \(A\) be the event that a baby is premature and \(B\) be the event that a baby weighs more than 9.5 pounds. Determine if the two events are disjoint or not. Also determine if they are independent. Explain your reasoning.

  2. What is the probability that a baby is premature given the baby is female? What about the probability that a baby is premature given the baby is male? Calculate the probabilities and also create a horizontal stacked bar plot of sex with relative frequencies of premie. Have the sex of a baby on the y-axis and fill the bars according to whether the baby was premature or not.

  3. Using the results in exercises above and Bayes’ theorem, compute the probability that a baby is female given the baby is premature. Provided that the event \(A\) is a baby is female and the event \(B\) is a baby is premature, is \(A\) independent of \(B\)? Why or why not?

🧶 ✅ ⬆️ Knit, commit, and push your final changes to GitHub with a meaningful commit message.

Simulation-based Inference

Baby weights

According to this article, the World Health Organization (WHO)-released average birth weight of a full-term female baby is 7.125 pounds (lbs).

We want to evaluate whether the average weight of full-term female babies in the US is significantly different than 7.125 lbs.

  1. Conduct a hypothesis test following the steps described below.
null_dist <- births_girl %>%
  specify(response = ____) %>%
  hypothesize(null = ____, __ = ____) %>%
  generate(reps = 500, type = _____) %>%
  calculate(stat = ____)

🧶 ✅ ⬆️ Knit, commit, and push your final changes to GitHub with a meaningful commit message.

Baby weight vs. smoking

Consider the possible relationship between a mother’s smoking habit and the weight of her baby. Plotting the data is a useful first step because it helps us quickly visualize trends, identify strong associations, and develop research questions.

  1. Make side-by-side boxplots displaying the relationship between habit and weight. What does the plot highlight about the relationship between these two variables?

  2. Before moving forward, save a version of the dataset omitting observations where there are NAs for habit. You can call this version births_habitgiven.

We want to examine if the relationship seen in the side-by-side boxplots is statistically significant. We will conduct a hypothesis test on whether the average weight of babies born to smoking mothers is less than that of babies born to non-smoking mothers.

  1. Let’s conduct the appropriate hypothesis test.

🧶 ✅ ⬆️ Knit, commit, and push your final changes to GitHub with a meaningful commit message.

Submission

Knit to PDF to create a PDF document. Stage and commit all remaining changes, and push your work to GitHub. Make sure all files are updated on your GitHub repo.

Only upload your PDF document to Gradescope. Before you submit the uploaded document, mark where each answer is to the exercises. If any answer spans multiple pages, then mark all pages. Associate the “Workflow & formatting” section with the first page.

Grading (60 pts)


Component Points
Ex 1 1
Ex 2 4
Ex 3 4
Ex 4 5
Ex 5 18
Ex 6 2
Ex 7 1.5
Ex 8 18.5
Workflow & formatting 6

Grading notes: