+ - 0:00:00
Notes for current slide
Notes for next slide

Introduction to Probability

Bora Jin

1 / 19

Material

🎥 Watch Probability

2 / 19

Today's Goal

  • Understand population, sample, event, sample space, and probability.
  • Compute probabilities of events from data
  • Create a contingency table using pivot_wider() and kable()
  • Use a contingency table to explore the relationship between two categorical variables.
3 / 19

Quiz

One goal of statistics is to answer a research question, by making inferences about a population based on data in one or more samples.

4 / 19

Quiz

One goal of statistics is to answer a research question, by making inferences about a population based on data in one or more samples.

Q - What is a research question?

4 / 19

Quiz

One goal of statistics is to answer a research question, by making inferences about a population based on data in one or more samples.

Q - What is a research question?

What we want to learn and what we are curious about. For example:

  • How likely is it for patients with early Parkinson's disease to experience a serious movement disorder within 5 years after the first diagnosis?
  • Does the average amount of caffeine vary by vendor in 12 oz. cups of coffee at Duke coffee shops?
  • How popular is the president among college students?
4 / 19

Quiz

Q - What is a "population"?

5 / 19

Quiz

Q - What is a "population"?

  • Entire group we are interested in studying
  • Parameter: a numerical quantity derived from the population (almost always unknown)
  • If we had data from every unit in the population, we could just calculate population parameters and be done!
5 / 19

Quiz

Q - What is a "population"?

  • Entire group we are interested in studying
  • Parameter: a numerical quantity derived from the population (almost always unknown)
  • If we had data from every unit in the population, we could just calculate population parameters and be done!

Q - What is a "sample"?

5 / 19

Quiz

Q - What is a "population"?

  • Entire group we are interested in studying
  • Parameter: a numerical quantity derived from the population (almost always unknown)
  • If we had data from every unit in the population, we could just calculate population parameters and be done!

Q - What is a "sample"?

  • The group we have collected data from (a subset of our population of interest)
  • Statistic: a numerical quantity derived from a sample
  • Usually we have to settle with a sample and draw conclusions from it because the population is just too big!
5 / 19

Quiz

Q - What is a good sample and why is it important?

6 / 19

Quiz

Q - What is a good sample and why is it important?

  • A sample that represents the target population well (has similar characteristics as the population) is a good sample
  • Representativeness is what makes the conclusions and inferences from the sample generalizable and valid to the whole population
  • Similar to tasting a spoonful of soup while cooking to make an inference about the entire pot
6 / 19

Quiz

Q - Identify the population and potential samples to answer the following questions.

How likely is it for patients with early Parkinson's disease to experience a serious movement disorder within 5 years after the first diagnosis?

7 / 19

Quiz

Q - Identify the population and potential samples to answer the following questions.

How likely is it for patients with early Parkinson's disease to experience a serious movement disorder within 5 years after the first diagnosis?

  • population: all patients with early Parkinson's disease
  • sample: patients with early Parkinson's disease who agree to be followed up from Duke Hospital
7 / 19

Quiz

Q - Identify the population and potential samples to answer the following questions.

Does the average amount of caffeine vary by vendor in 12 oz. cups of coffee at Duke coffee shops?

8 / 19

Quiz

Q - Identify the population and potential samples to answer the following questions.

Does the average amount of caffeine vary by vendor in 12 oz. cups of coffee at Duke coffee shops?

  • population: all 12 oz. cups of coffee at Duke coffee shops
  • sample: randomly selected 50 cups of coffee (12 oz.) by each vendor at Duke coffee shops
8 / 19

Quiz

Q - Identify the population and potential samples to answer the following questions.

How popular is the president among college students?

9 / 19

Quiz

Q - Identify the population and potential samples to answer the following questions.

How popular is the president among college students?

  • population: all college students
  • sample: randomly selected college students who share their opinions about the president
9 / 19

Quiz

In order to draw principled conclusions from data, we rely on a formal probabilistic framework that allows us to quantify uncertainty.

Q - In probability theory, what is a term indicating the result of an observation or experiment?

10 / 19

Quiz

In order to draw principled conclusions from data, we rely on a formal probabilistic framework that allows us to quantify uncertainty.

Q - In probability theory, what is a term indicating the result of an observation or experiment?

  • "Event"
  • Usually denoted by capital letters
  • A is the event that a student in STA199 is a sophomore.
  • Ac ( A complement ) is the event that A is not true; a student in STA199 is not a sophomore.
10 / 19

Quiz

Q - What is a "sample space"?

11 / 19

Quiz

Q - What is a "sample space"?

  • Set of all possible outcomes
  • {Freshman, Sophomore, Junior, Senior}
  • Ω=AAc
  • Varies by research questions
11 / 19

Quiz

Q - Say in words the following math expressions.

A is the event that a student in STA199 is a sophomore. B is the event that a student in STA199 is a senior.

  • Union: AB
12 / 19

Quiz

Q - Say in words the following math expressions.

A is the event that a student in STA199 is a sophomore. B is the event that a student in STA199 is a senior.

  • Union: AB

  • Intersection: AB

12 / 19

Quiz

Q - Say in words the following math expressions.

A is the event that a student in STA199 is a sophomore. B is the event that a student in STA199 is a senior.

  • Union: AB

  • Intersection: AB

    • Is it possible? If not, we say A and B are mutually exclusive or disjoint.
12 / 19

Quiz

Q - Say in words the following math expressions.

A is the event that a student in STA199 is a sophomore. B is the event that a student in STA199 is a senior.

  • Union: AB

  • Intersection: AB

    • Is it possible? If not, we say A and B are mutually exclusive or disjoint.
  • Complement: Bc

12 / 19

Quiz

Q - What is the "probability" of an event?

13 / 19

Quiz

Q - What is the "probability" of an event?

  • How likely an event is to occur
  • Interpretations (1) vs. (2)

    • (1) The proportion of times the event would occur if it could be observed an infinite number of times.
    • (2) Degree of belief an event will happen
13 / 19

Quiz

Q - Probability Rules?

14 / 19

Quiz

Q - Probability Rules?

  • [0, 1]
14 / 19

Quiz

Q - Probability Rules?

  • [0, 1]

  • The probability of the entire sample space is 1; P(Ω)=1

14 / 19

Quiz

Q - Probability Rules?

  • [0, 1]

  • The probability of the entire sample space is 1; P(Ω)=1

  • Complement rule: P(A)+P(Ac)=1

14 / 19

Quiz

Q - Probability Rules?

  • [0, 1]

  • The probability of the entire sample space is 1; P(Ω)=1

  • Complement rule: P(A)+P(Ac)=1

  • Additive rule: P(AB)=P(A)+P(B)P(AB)

    • Avoiding double-counting
    • If A and B are disjoint, P(AB)=0 and thus P(AB)=P(A)+P(B)
14 / 19

Questions?

15 / 19

Let's Practice Together!

For 64,821 men enrolled in the European Prospective Investigation into Cancer and Nutrition after a mean follow-up of 16.4 years

Did not die Died
Does not drink coffee 5438 1039
Drinks coffee occasionally 25369 4440
Drinks coffee regularly 24934 3601

Source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5788283/

16 / 19

Let's Practice Together!

Let A be the event that a man died and B be the event that a man was a non-coffee drinker. Calculate the following probabilities for a randomly selected person in the cohort:

Did not die Died
Does not drink coffee 5438 1039
Drinks coffee occasionally 25369 4440
Drinks coffee regularly 24934 3601
  • P(A)
17 / 19

Let's Practice Together!

Let A be the event that a man died and B be the event that a man was a non-coffee drinker. Calculate the following probabilities for a randomly selected person in the cohort:

Did not die Died
Does not drink coffee 5438 1039
Drinks coffee occasionally 25369 4440
Drinks coffee regularly 24934 3601
  • P(A) = 9080/64821
17 / 19

Let's Practice Together!

Let A be the event that a man died and B be the event that a man was a non-coffee drinker. Calculate the following probabilities for a randomly selected person in the cohort:

Did not die Died
Does not drink coffee 5438 1039
Drinks coffee occasionally 25369 4440
Drinks coffee regularly 24934 3601
  • P(A) = 9080/64821
  • P(B)
17 / 19

Let's Practice Together!

Let A be the event that a man died and B be the event that a man was a non-coffee drinker. Calculate the following probabilities for a randomly selected person in the cohort:

Did not die Died
Does not drink coffee 5438 1039
Drinks coffee occasionally 25369 4440
Drinks coffee regularly 24934 3601
  • P(A) = 9080/64821
  • P(B) = 6477/64821
17 / 19

Let's Practice Together!

Let A be the event that a man died and B be the event that a man was a non-coffee drinker. Calculate the following probabilities for a randomly selected person in the cohort:

Did not die Died
Does not drink coffee 5438 1039
Drinks coffee occasionally 25369 4440
Drinks coffee regularly 24934 3601
  • P(A) = 9080/64821
  • P(B) = 6477/64821
  • P(AB)
17 / 19

Let's Practice Together!

Let A be the event that a man died and B be the event that a man was a non-coffee drinker. Calculate the following probabilities for a randomly selected person in the cohort:

Did not die Died
Does not drink coffee 5438 1039
Drinks coffee occasionally 25369 4440
Drinks coffee regularly 24934 3601
  • P(A) = 9080/64821
  • P(B) = 6477/64821
  • P(AB) = 1039/64821
17 / 19

Let's Practice Together!

Let A be the event that a man died and B be the event that a man was a non-coffee drinker. Calculate the following probabilities for a randomly selected person in the cohort:

Did not die Died
Does not drink coffee 5438 1039
Drinks coffee occasionally 25369 4440
Drinks coffee regularly 24934 3601
  • P(A) = 9080/64821
  • P(B) = 6477/64821
  • P(AB) = 1039/64821
  • P(AB)
17 / 19

Let's Practice Together!

Let A be the event that a man died and B be the event that a man was a non-coffee drinker. Calculate the following probabilities for a randomly selected person in the cohort:

Did not die Died
Does not drink coffee 5438 1039
Drinks coffee occasionally 25369 4440
Drinks coffee regularly 24934 3601
  • P(A) = 9080/64821
  • P(B) = 6477/64821
  • P(AB) = 1039/64821
  • P(AB) = (9080 + 6477 - 1039)/64821
17 / 19

Let's Practice Together!

Let A be the event that a man died and B be the event that a man was a non-coffee drinker. Calculate the following probabilities for a randomly selected person in the cohort:

Did not die Died
Does not drink coffee 5438 1039
Drinks coffee occasionally 25369 4440
Drinks coffee regularly 24934 3601
  • P(A) = 9080/64821
  • P(B) = 6477/64821
  • P(AB) = 1039/64821
  • P(AB) = (9080 + 6477 - 1039)/64821
  • P(ABc)
17 / 19

Let's Practice Together!

Let A be the event that a man died and B be the event that a man was a non-coffee drinker. Calculate the following probabilities for a randomly selected person in the cohort:

Did not die Died
Does not drink coffee 5438 1039
Drinks coffee occasionally 25369 4440
Drinks coffee regularly 24934 3601
  • P(A) = 9080/64821
  • P(B) = 6477/64821
  • P(AB) = 1039/64821
  • P(AB) = (9080 + 6477 - 1039)/64821
  • P(ABc) = (9080 + 58344 - 8041)/64821
17 / 19

Let's Practice Together!

Go to AE 10: Introduction to Probability

18 / 19

Bulletin

19 / 19

Material

🎥 Watch Probability

2 / 19
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow