class: center, middle, inverse, title-slide # Introduction to Probability ### Bora Jin --- layout: true <div class="my-footer"> <span> <a href="https://introds.org" target="_blank">introds.org</a> </span> </div> --- ## Material 🎥 Watch [Probability](https://warpwire.duke.edu/w/8RUGAA/) - [Slides](https://sta198f2021.github.io/website/slides/week-02/w2-l03-prob.html) --- ## Today's Goal - Understand population, sample, event, sample space, and probability. - Compute probabilities of events from data - Create a contingency table using `pivot_wider()` and `kable()` - Use a contingency table to explore the relationship between two categorical variables. --- ## Quiz One goal of statistics is to answer a research question, by making inferences about a population based on data in one or more samples. -- **Q - What is a research question?** -- What we want to learn and what we are curious about. For example: - How likely is it for patients with early Parkinson's disease to experience a serious movement disorder within 5 years after the first diagnosis? - Does the average amount of caffeine vary by vendor in 12 oz. cups of coffee at Duke coffee shops? - How popular is the president among college students? --- ## Quiz **Q - What is a "population"?** -- - Entire group we are interested in studying - **Parameter**: a numerical quantity derived from the population (almost always unknown) - If we had data from every unit in the population, we could just calculate population parameters and be done! -- **Q - What is a "sample"?** -- - The group we have collected data from (a subset of our population of interest) - **Statistic**: a numerical quantity derived from a sample - Usually we have to settle with a sample and draw conclusions from it because the population is just too big! --- ## Quiz **Q - What is a good sample and why is it important?** -- - A sample that represents the target population well (has similar characteristics as the population) is a good sample - Representativeness is what makes the conclusions and inferences from the sample generalizable and valid to the whole population - Similar to tasting a spoonful of soup while cooking to make an inference about the entire pot --- ## Quiz **Q - Identify the population and potential samples to answer the following questions.** How likely is it for patients with early Parkinson's disease to experience a serious movement disorder within 5 years after the first diagnosis? -- - population: all patients with early Parkinson's disease - sample: patients with early Parkinson's disease who agree to be followed up from Duke Hospital --- ## Quiz **Q - Identify the population and potential samples to answer the following questions.** Does the average amount of caffeine vary by vendor in 12 oz. cups of coffee at Duke coffee shops? -- - population: all 12 oz. cups of coffee at Duke coffee shops - sample: randomly selected 50 cups of coffee (12 oz.) by each vendor at Duke coffee shops --- ## Quiz **Q - Identify the population and potential samples to answer the following questions.** How popular is the president among college students? -- - population: all college students - sample: randomly selected college students who share their opinions about the president --- ## Quiz In order to draw principled conclusions from data, we rely on a formal **probabilistic framework** that allows us to quantify uncertainty. **Q - In probability theory, what is a term indicating the result of an observation or experiment?** -- - "Event" - Usually denoted by capital letters - `\(A\)` is the event that a student in STA199 is a sophomore. - `\(A^c\)` ( `\(A\)` complement ) is the event that `\(A\)` is not true; a student in STA199 is not a sophomore. --- ## Quiz **Q - What is a "sample space"?** -- - Set of all possible outcomes - {Freshman, Sophomore, Junior, Senior} - `\(\Omega = A \cup A^c\)` - Varies by research questions --- ## Quiz **Q - Say in words the following math expressions.** `\(A\)` is the event that a student in STA199 is a sophomore. `\(B\)` is the event that a student in STA199 is a senior. - Union: `\(A \cup B\)` -- - Intersection: `\(A \cap B\)` -- - Is it possible? If not, we say `\(A\)` and `\(B\)` are mutually exclusive or disjoint. -- - Complement: `\(B^c\)` --- ## Quiz **Q - What is the "probability" of an event?** -- - How likely an event is to occur - Interpretations (1) vs. (2) - (1) The proportion of times the event would occur if it could be observed an infinite number of times. - (2) Degree of belief an event will happen --- ## Quiz **Q - Probability Rules?** -- - [0, 1] -- - The probability of the entire sample space is 1; `\(P(\Omega) = 1\)` -- - Complement rule: `\(P(A) + P(A^c) = 1\)` -- - Additive rule: `\(P(A \cup B) = P(A) + P(B) - P(A \cap B)\)` - Avoiding double-counting - If `\(A\)` and `\(B\)` are disjoint, `\(P(A \cap B)=0\)` and thus `\(P(A \cup B) = P(A) + P(B)\)` --- class: middle, center # Questions? --- ## Let's Practice Together! <img src="img/10/coffee.png" width="80%" style="display: block; margin: auto;" /> For 64,821 men enrolled in the European Prospective Investigation into Cancer and Nutrition after a mean follow-up of 16.4 years | | Did not die| Died| |:--------------------------|-----------:|----:| |Does not drink coffee | 5438| 1039| |Drinks coffee occasionally | 25369| 4440| |Drinks coffee regularly | 24934| 3601| .small[*Source:* https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5788283/] --- ## Let's Practice Together! Let `\(A\)` be the event that a man died and `\(B\)` be the event that a man was a non-coffee drinker. Calculate the following probabilities for a randomly selected person in the cohort: .midi[ | | Did not die| Died| |:--------------------------|-----------:|----:| |Does not drink coffee | 5438| 1039| |Drinks coffee occasionally | 25369| 4440| |Drinks coffee regularly | 24934| 3601| ] - `\(P(A)\)` -- = 9080/64821 -- - `\(P(B)\)` -- = 6477/64821 -- - `\(P(A \cap B)\)` -- = 1039/64821 -- - `\(P(A \cup B)\)` -- = (9080 + 6477 - 1039)/64821 -- - `\(P(A \cup B^c)\)` -- = (9080 + 58344 - 8041)/64821 --- ## Let's Practice Together! Go to [AE 10: Introduction to Probability](https://sta199-summer22.netlify.app/appex/ae10_BJ.html) --- ## Bulletin - Watch videos for [Prepare: May 26](https://sta199-summer22.netlify.app/prepare/week03_may26_BJ.html) - Submit your `ae10` - HW02 released