class: center, middle, inverse, title-slide # Foundations of Inference ### Bora Jin --- layout: true <div class="my-footer"> <span> <a href="https://introds.org" target="_blank">introds.org</a> </span> </div> --- ## Material 🎥 Watch [Scientific Studies and Confounding](https://youtu.be/WnMzTBrZDcc) - [Slides](https://rstudio-education.github.io/datascience-box/course-materials/slides/u2-d15-studies-confounding/u2-d15-studies-confounding.html#1) --- ## Today's Goal - Understand statistical process terminology - Understand the types of conclusions that can be made given the underlying statistical process - Learn to simulate from a binomial distribution --- ## Quiz **Q - What is an observational study? Any examples?** -- - Collect data in a way that does not interfere with how the data arise - Establish **associations** (!= causation) - Data often cheaper and easier to collect - e.g., Smoking behavior and health outcomes --- ## Quiz **Q - What is an experimental study? Any examples?** -- - Randomly assign subjects to treatments vs. control - Establish **causal connections** - Often more expensive - Sometimes impossible or unethical to design an experiment - e.g., Covid-19 vaccine tests in developmental phase --- ## Quiz In the prep video, the following terms are introduced: <br> response variables and explanatory variables **Q - What is a response variable?** -- - "Dependent variable" - Denoted by `\(y\)` - The focus of a research question -- **Q - What is an explanatory variable?** -- - "Independent variable" - Denoted by `\(x\)` - Variables that might affect or explain changes in the response variable --- ## Quiz **Q - Identify a response variable and an explanatory variable regarding the following examples.** Do larger homes in good locations lead to higher home selling prices? -- - `\(y\)`: home selling price - `\(x\)`: size of a house, neighborhood -- Which procedure prolongs life more for breast cancer patients, chemo or anti-estrogen treatment? -- - `\(y\)`: survival time - `\(x\)`: type of treatment - In real life, you would include several more explanatory variables such as age, general health conditions, weight, etc. --- ## Quiz **Q - What is a confounding variable?** -- - Extraneous variable that affects both the explanatory and the response variable, and that makes it seem like there is a relationship between them. - If you fail to account for confounding variables, you may not correctly observe the actual relationship between explanatory variables and a response variable. --- ## Quiz **Q - Identify the confounding variable in each of the following statements.** As the amount of ice cream sales increases, the number of shark attacks also increases. -- - Confounding variable: summer - Both increase in the summer, when people go to the beach. -- The higher the number of firefighters at a fire is, the greater the amount of damage caused by that fire. -- - Confounding variable: size of a fire - More firefighters are sent to larger and severer fires that cause greater damage. --- ## Quiz In the prep video, the following terms are introduced: <br> random sampling and non-random sampling **Q - What is random sampling?** -- - A sampling method that allows for randomization of sample selection. - Conclusions from a random sample are **generalizable**. - Simple random sampling: every element in a whole population has an equal chance of being chosen for the sample <!-- - Stratified random sampling: it divides the population into subgroups or strata such that each subgroup is homogeneous. Then simple random sampling is applied within each stratum. --> <!-- - Useful when you need to make sure you get a sufficient sample from each demographic group and then you weight to the approximate percentage in the population. --> --- ## Quiz **Q - What is non-random sampling?** -- - In a non-random sample, individuals are selected based on non-random criteria, and not every individual has a chance of being included. - Conclusions from a non-random sample are **not generalizable**. - e.g., I make my close friends complete a survey and analyze the survey data for a final project. - e.g.. An online survey available to anyone who wishes to provide responses --- class: middle, center # Questions? --- ## Let's Practice Together! Go to [AE 12: Foundations of Inference](https://sta199-summer22.netlify.app/appex/ae12_BJ.html) --- ## Bulletin - Exam 01 - to be released today at 1pm - due Monday, May 30 at 12:59pm - May 30 - Memorial day holiday. No class. - Have fun after submitting Exam 01! - Nothing to prepare for May 31! - Lab 04 due Tuesday, May 31 at 11:59pm. - Submit your `ae12` before class on Tuesday, May 31.