Lab #04: Probability

due Tuesday, May 31 at 11:59pm

Goals

Getting started

Don’t forget to label your R chunk. Your label should be short, informative, shouldn’t include spaces, and shouldn’t repeat a previous label.

Packages

We will use the tidyverse and knitr packages in this lab.

library(tidyverse)
library(knitr)

NC Courage

Today, we will be working with data from the first three full seasons of the NC Courage, a highly successful National Women’s Soccer League (NWSL) team located near Duke in Cary, NC. The Courage moved to the Triangle from Western New York in 2017 and had three very successful first seasons, culminating in winning the championship game that was held at their stadium in Cary in 2019! (Data for this lab were sourced from the nwslR package on Github, and verified with the NC Courage website by Meredith Brown in a previous semester.)

Use the code below to load the data set.

courage <- read_csv("data/courage.csv")
glimpse(courage)
## Rows: 78
## Columns: 10
## $ game_id     <chr> "washington-spirit-vs-north-carolina-courage-2017-04-15", …
## $ game_date   <chr> "4/15/2017", "4/22/2017", "4/29/2017", "5/7/2017", "5/14/2…
## $ game_number <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,…
## $ home_team   <chr> "WAS", "NC", "NC", "BOS", "ORL", "NC", "NC", "CHI", "NC", …
## $ away_team   <chr> "NC", "POR", "ORL", "NC", "NC", "CHI", "NJ", "NC", "KC", "…
## $ opponent    <chr> "WAS", "POR", "ORL", "BOS", "ORL", "CHI", "NJ", "CHI", "KC…
## $ home_pts    <dbl> 0, 1, 3, 0, 3, 1, 2, 3, 2, 3, 0, 0, 2, 1, 1, 0, 1, 2, 2, 2…
## $ away_pts    <dbl> 1, 0, 1, 1, 1, 3, 0, 2, 0, 1, 1, 1, 0, 0, 0, 1, 2, 0, 3, 1…
## $ result      <chr> "win", "win", "win", "win", "loss", "loss", "win", "loss",…
## $ season      <dbl> 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017…
  1. How many observations are in this dataset? What does each observation represent? (You do not need to create a code chunk here)

  2. Each season the Courage play 26 games. We want to find out whether they win more in the early, middle or late season.

🧶 ✅ ⬆️ Knit, commit, and push your final changes to GitHub with a meaningful commit message.

  1. By default, R will arrange the categories of a categorical variable in alphabetical order in any output and visualizations, but we want the levels for seasonal_category to be in logical order. To achieve this, we will use the factor() function to make both of these variables factors (categorical variables with ordering) and specify the levels we wish to use.

The code to reorder levels for seasonal_category is below.

# seasonal_courage %>%
#   mutate(seasonal_category =
#            factor(seasonal_category,
#                   levels = c("early", "middle", "late")))
  1. Based on the data,

🧶 ✅ ⬆️ Knit, commit, and push your final changes to GitHub with a meaningful commit message.

  1. Independence, contingency tables and ties.

Bayes’ theorem tells us that

\[ P(\text{tie} \ | \ \text{home}) = \frac{P(\text{home} \ | \ \text{tie}) P(\text{tie})}{P(\text{home})} \]

🧶 ✅ ⬆️ Knit, commit, and push your final changes to GitHub with a meaningful commit message.

Submission

Once you are fully satisfied with your lab, Knit to .pdf to create a PDF document.

Follow the instructions in previous labs to submit your PDF to Gradescope.

Be sure to identify which problems are on each page using Gradescope.

Once you are finished with the lab, you will submit the PDF document produced from your final knit, commit, and push to Gradescope.

Before you wrap up the assignment, make sure all documents are updated on your GitHub repo. We will be checking these to make sure you have been practicing how to commit and push changes. Remember – you must turn in a .pdf file to the Gradescope page by the submission deadline to be considered “on time”.

To submit your assignment:

Grading (50 pts)


Component Points
Ex 1 2
Ex 2 10
Ex 3 3
Ex 4 10
Ex 5 20
Workflow & formatting 5

Grading notes: