lab04.Rmd
to open the template R Markdown file.Don’t forget to label your R chunk. Your label should be short, informative, shouldn’t include spaces, and shouldn’t repeat a previous label.
We will use the tidyverse and knitr packages in this lab.
library(tidyverse)
library(knitr)
Today, we will be working with data from the first three full seasons of the NC Courage, a highly successful National Women’s Soccer League (NWSL) team located near Duke in Cary, NC. The Courage moved to the Triangle from Western New York in 2017 and had three very successful first seasons, culminating in winning the championship game that was held at their stadium in Cary in 2019! (Data for this lab were sourced from the nwslR
package on Github, and verified with the NC Courage website by Meredith Brown in a previous semester.)
Use the code below to load the data set.
<- read_csv("data/courage.csv")
courage glimpse(courage)
## Rows: 78
## Columns: 10
## $ game_id <chr> "washington-spirit-vs-north-carolina-courage-2017-04-15", …
## $ game_date <chr> "4/15/2017", "4/22/2017", "4/29/2017", "5/7/2017", "5/14/2…
## $ game_number <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,…
## $ home_team <chr> "WAS", "NC", "NC", "BOS", "ORL", "NC", "NC", "CHI", "NC", …
## $ away_team <chr> "NC", "POR", "ORL", "NC", "NC", "CHI", "NJ", "NC", "KC", "…
## $ opponent <chr> "WAS", "POR", "ORL", "BOS", "ORL", "CHI", "NJ", "CHI", "KC…
## $ home_pts <dbl> 0, 1, 3, 0, 3, 1, 2, 3, 2, 3, 0, 0, 2, 1, 1, 0, 1, 2, 2, 2…
## $ away_pts <dbl> 1, 0, 1, 1, 1, 3, 0, 2, 0, 1, 1, 1, 0, 0, 0, 1, 2, 0, 3, 1…
## $ result <chr> "win", "win", "win", "win", "loss", "loss", "win", "loss",…
## $ season <dbl> 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017…
How many observations are in this dataset? What does each observation represent? (You do not need to create a code chunk here)
Each season the Courage play 26 games. We want to find out whether they win more in the early, middle or late season.
seasonal_category
that classifies NC courage games as early
(games 1-9), middle
(games 10-17), or late
(18-26) season.win
that takes the value 0
if the courage lose and 1
if they win.seasonal_courage
and print it to the screen.🧶 ✅ ⬆️ Knit, commit, and push your final changes to GitHub with a meaningful commit message.
R
will arrange the categories of a categorical variable in alphabetical order in any output and visualizations, but we want the levels for seasonal_category
to be in logical order. To achieve this, we will use the factor()
function to make both of these variables factors (categorical variables with ordering) and specify the levels we wish to use.The code to reorder levels for seasonal_category
is below.
# seasonal_courage %>%
# mutate(seasonal_category =
# factor(seasonal_category,
# levels = c("early", "middle", "late")))
early
then middle
, then late
). Hint: what dplyr
verb changes the order of output?🧶 ✅ ⬆️ Knit, commit, and push your final changes to GitHub with a meaningful commit message.
Create a new column called home_courage
that takes values “home” if Courage is the home team and “away” if Courage is the away team, save this data frame.
Using the data frame above, create a 3 x 2
contingency table with
columns denoting whether or not a game is home
or away
for the Courage and
rows denoting whether the Courage win, lose or tie.
Your tibble output may be 3 x 3
counting the game result (lose, tie, win) as a column. When the same table viewed as a contingency table, however, we count their dimensions as 3 x 2
.
Use the contingency table to find
Bayes’ theorem tells us that
\[ P(\text{tie} \ | \ \text{home}) = \frac{P(\text{home} \ | \ \text{tie}) P(\text{tie})}{P(\text{home})} \]
Using Bayes’ theorem, find and report the conditional probability a game is a tie given a game is home. Check your result using the contingency table.
Finally, is the event that a game is a tie independent of the Courage playing at home or away? Why?
🧶 ✅ ⬆️ Knit, commit, and push your final changes to GitHub with a meaningful commit message.
Once you are fully satisfied with your lab, Knit to .pdf to create a PDF document.
Follow the instructions in previous labs to submit your PDF to Gradescope.
Be sure to identify which problems are on each page using Gradescope.
Once you are finished with the lab, you will submit the PDF document produced from your final knit, commit, and push to Gradescope.
Before you wrap up the assignment, make sure all documents are updated on your GitHub repo. We will be checking these to make sure you have been practicing how to commit and push changes. Remember – you must turn in a .pdf file to the Gradescope page by the submission deadline to be considered “on time”.
To submit your assignment:
Component | Points |
---|---|
Ex 1 | 2 |
Ex 2 | 10 |
Ex 3 | 3 |
Ex 4 | 10 |
Ex 5 | 20 |
Workflow & formatting | 5 |
Grading notes: