Clone the repository entitled “ae11-GitHubUsername” at course GitHub organization page on your RStudio.
Open the .Rmd
file and replace “Your Name” with your name.
The global coronavirus pandemic illustrates the need for accurate testing of COVID-19, as its extreme infectivity poses a significant public health threat. Due to the time-sensitive nature of the situation, the FDA enacted emergency authorization of a number of serological tests including Abbott Alinity for COVID-19 in 2020. Full details of these tests may be found on its website here.
We will define the following events:
The Abbott Alinity test has an estimated sensitivity of 100%, P(Pos | Covid) = 1, and specificity of 99%, P(Neg | No Covid) = 0.99.
Suppose the prevalence of COVID-19 in the general population is about 2%, P(Covid) = 0.02.
Q - Use the Hypothetical 10,000 to calculate the probability a person has COVID-19 given they get a positive test result, i.e. P(Covid | Pos).
Covid | No Covid | Total | |
---|---|---|---|
Pos | |||
Neg | |||
Total | 10000 |
Q - Use Bayes’ Theorem to calculate P(Covid|Pos).
Q - Now suppose the prevalence of COVID-19 in the general population is 10%. Can you reuse code from the previous calculation somehow?
Q - Under which prevalence level would you be worried more after you get a positive result? Does your expectation/intuition match the computed probabilities?
A study conducted in Whickham, England recorded participants’ age, smoking status at baseline between 1972 and 1974 and then 20 years later recorded their health outcome.
Let’s analyze the relationships between these variables, first two at a time, and then controlling for the third.
Today’s data lives in the mosaicData
package. We start by loading relevant packages:
library(tidyverse)
library(mosaicData)
The dataset we’ll use is called Whickham
. You can find out more about the dataset by inspecting their documentation, which you can access by running ?Whickham
in the console.
Q - How many variables are in this dataset? What type of variable is each? Display each variable using an appropriate visualization.
Q - Create a visualization depicting the relationship between smoking status and health outcome. Briefly describe the relationship, and evaluate whether this meets your expectations. Additionally, calculate the relevant conditional probabilities to help your narrative. Here is some code to get you started:
Whickham %>%
count(smoker, outcome)
Q - Create a new variable called age_cat
using the following scheme:
age <= 44 ~ "18-44"
age > 44 & age <= 64 ~ "45-64"
age > 64 ~ "65+"
Q - Re-create the visualization depicting the relationship between smoking status and health outcome, faceted by age_cat
. What changed? What might explain this change? Extend the contingency table from earlier by breaking it down by age category and use it to help your narrative.
Whickham %>%
count(smoker, age_cat, outcome)