Getting Started

  • Complete this survey!

  • Wait until we finish watching a video

Introduction

Today we load knitr to use kable() function which neatly displays tables:

library(tidyverse)
library(knitr)

We will also use a new function pivot_wider() in dplyr package (\(\in\) tidyverse):

For this Application Exercise, we will look at our newly collected data. Remove eval = FALSE to read data.

sta199 <- read_csv("data/sta199-ae10.csv")

The dataset includes

  • year: Year in school
  • animal: Whether you prefer cats or dogs
  • movie: Favorite movie genre
  • stat_major: statistical science or not

Events, Sample Space

Part 1

Give two examples of an event from the dataset.

Part 2

Let’s take a look at favorite movie genre. Note that we have categorized genres so that each person can only have one favorite genre.

Q- What is the sample space for favorite movie genre? You can use code to identify the sample space.

# code here

Part 3

Q- How large is the sample space of any individual’s response?

# code here

The sample space for the four survey questions contains \(4 \times 2 \times 4 \times 2 = 64\) different outcomes.

Probability

Part 4

Let’s make a table that includes year, the number of students in each, and the associated probabilities.

Q- What is the probability a randomly selected STA199 student is a freshman?

# code here

Part 5

Q- What is the probability a randomly selected STA199 student favors cats? Answer it with a table that includes animal, the number of students who prefer each, and the associated probabilities.

# code here

Part 6

Q- What is the probability a randomly selected STA199 student is not a senior and prefers dogs?

Let \(A\) be the event that someone is not a senior and prefers dogs.

# code here

Part 7

Q- What is the probability a randomly selected STA199 student likes either action movies or comedy and is a statsci major?

Let \(B\) be the event someone is an action or comedy movie lover and a statsci major.

# code here

Part 8

Now we examine the relationship between favorite animal and favorite movie. Let’s make a table of the number of students for every combination of favorite animals and movie genres.

# code here

Using pivot_wider(), we’ll reformat the data into a contingency table, a table frequently used to study the association between two categorical variables. In this contingency table, each row will represent an animal, each column will represent a movie, and each cell is the number of students have a particular combination of animal and movie.

# code here

Q - How many students in STA199 like sci-fi movies?

Q - How many students in STA199 like dogs and dramas?

Practice

For each of the following exercises:

  • Calculate the probability using a relevant contingency table.
  • Then write code to check your answer using the sta199 data frame and dplyr functions.
  1. Relationship between year and major

Q - What is the probability a randomly selected STA199 student is a junior or not a statsci major?

# code here
# code here

Q - What is the probability a randomly selected STA199 student is a statsci major?

# code here
  1. Relationship between animal and major

Q - What is the probability a randomly selected STA199 student likes dogs and a statsci major?

# code here
# code here
  1. Relationship between year and movie

Q - What is the probability a randomly selected STA199 student is a senior or does not pick sci-fi as the favorite movie genre?

# code here
# code here

Submitting Application Exercises

  • Once you have completed the activity, push your final changes to your GitHub repo.
  • Make sure you committed at least three times.
  • Check that your repo is updated on GitHub, and that’s all you need to do to submit application exercises for participation.