class: center, middle, inverse, title-slide # Models with Multiple Predictors 2 + Model Diagnostics ### Bora Jin --- layout: true <div class="my-footer"> <span> <a href="https://introds.org" target="_blank">introds.org</a> </span> </div> --- ## Material 🎥 Watch [Models with Multiple Predictors 2](https://www.youtube.com/watch?v=nJAYRnLPb10) - [Slides](https://rstudio-education.github.io/datascience-box/course-materials/slides/u4-d05-more-model-multiple-predictors/u4-d05-more-model-multiple-predictors.html#1) --- ## Today's Goal - Use functions in `R` to fit a linear model with multiple predictors - Model interactions between variables - Understand what's linear in linear regressions - Understand and implement CI and HT for regression parameters - Understand model diagnostics and how to handle common model violations --- ## Quiz Suppose a dataset called `mydata` has variables `y`, `x1`, and `x2`. The variable `x1` is numeric and `x2` is categorical. For the following questions, write out a regression model and the code to fit the model. **Q - Same slope and same intercept between `x1` and `y` for different levels of `x2`.** -- `$$y = \beta_0 + \beta_1~x_1 + \epsilon$$` -- ```r lm(y ~ x1, data = mydata) ``` --- ## Quiz **Q - Same slope and different intercept (parallel lines) for different levels of `x2`.** -- `$$y = \beta_0 + \beta_1~x_1 + \beta_2~x_2 + \epsilon$$` -- ```r linear_reg(engine = "lm") %>% fit(y ~ x1 + x2, data = mydata) ``` --- ## Quiz **Q - Different slope and different intercept (non-parallel lines) for different levels of `x2`.** -- `$$y = \beta_0 + \beta_1~x_1 + \beta_2~x_2 + \beta_3~(x_1*x_2)+ \epsilon$$` -- ```r lm(y ~ x1*x2, data = mydata) lm(y ~ x1 + x2 + x1:x2, data = mydata) ``` --- ## Quiz **Q - Write separate fitted models for non-living artists (`artistliving` = 0) and for living artists (`artistliving` = 1) using the following result. Your fitted models should include `log_price` and `surface` only.** `$$\widehat{log\_price} = 4.91 + 0.00021~surface - 0.126~artistliving$$` `$$+ ~ 0.00048 ~surface * artistliving$$` -- - Non-living artists: `\(\widehat{log\_price} = 4.91 + 0.00021~surface\)` - Living artists: `\(\widehat{log\_price} = 4.784 + 0.00069~surface\)` - Non-parallel lines due to the interaction effect! --- class: middle, center # Model Diagnostics .footnote[Source: Duke STA210 by Prof. Mine Çetinkaya-Rundel https://sta210-s22.github.io/website/slides/lec-7.html#] --- ## Model Conditions - **L**inearity: There is a linear relationship between the response and predictor variables. - **I**ndependence: The errors are independent from each other. - **N**ormality (optional): The errors follow a normal distribution. - **E**qual variance: The variability of the errors is equal for all values of the predictor variable. - For multiple regression, the predictors should not be too correlated with each other. --- ## Linearity and Equal Variance - **Linearity:** The residuals vs. fitted values plot should show a random scatter of residuals around 0. - No distinguishable pattern or structure along the x or y axes. - Why do we want a complete random scatter? - It means that my model is good and captures any interesting (linear) relationship in the data. - Remaining patterns in residuals vs. fitted values suggest that the linear model is not the best assumption for the data. - **Equal variance:** The vertical spread of the residuals should be relatively constant across the plot. --- ## Linearity and Equal Variance **This is what we look for** <img src="22-multiplemodel2_BJ_files/figure-html/unnamed-chunk-5-1.png" width="70%" style="display: block; margin: auto;" /> --- ## Linearity and Equal Variance We **don't want** <br> increasing / decreasing variability in residuals as predicted value increases <img src="22-multiplemodel2_BJ_files/figure-html/unnamed-chunk-6-1.png" width="65%" style="display: block; margin: auto;" /> --- ## Linearity and Equal Variance We **don't want** <br> any groups of residuals <img src="22-multiplemodel2_BJ_files/figure-html/unnamed-chunk-7-1.png" width="65%" style="display: block; margin: auto;" /> --- ## Linearity and Equal Variance We **don't want** <br> residuals correlated with predicted values <img src="22-multiplemodel2_BJ_files/figure-html/unnamed-chunk-8-1.png" width="65%" style="display: block; margin: auto;" /> --- ## Linearity and Equal Variance We **don't want** <br> any patterns 1 <img src="22-multiplemodel2_BJ_files/figure-html/unnamed-chunk-9-1.png" width="65%" style="display: block; margin: auto;" /> --- ## Linearity and Equal Variance We **don't want** <br> any patterns 2 <img src="22-multiplemodel2_BJ_files/figure-html/unnamed-chunk-10-1.png" width="65%" style="display: block; margin: auto;" /> --- ## Normality <img src="22-multiplemodel2_BJ_files/figure-html/unnamed-chunk-11-1.png" width="70%" style="display: block; margin: auto;" /> --- ## Independence - We can often check the independence assumption based on the context of the data and how the observations were collected. - If the data were collected in a particular order, examine a scatterplot of the residuals versus order in which the data were collected. --- ## When Model Conditions Are Violated <img src="22-multiplemodel2_BJ_files/figure-html/unnamed-chunk-12-1.png" width="70%" style="display: block; margin: auto;" /> Linearity and equal variance seem violated. --- ## When Model Conditions Are Violated .pull-left[ <img src="22-multiplemodel2_BJ_files/figure-html/unnamed-chunk-13-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ Transform the response variable. This may help! - Natural log transformation on `\(y\)` variable: In `R`, `log(y)` - Helpful for extremely right skewed distribution and/or non-constant variance in residuals ] --- ## Log Transformation This is still a linear model with `\(\log(y)\)` as the response: $$\log(y) = \beta_0 + \beta_1~x + \epsilon ~\Rightarrow~ \widehat{\log(y)} = \hat{\beta}_0 + \hat{\beta}_1~x $$ .pull-left[ ```r logy <- log(y) lm2 <- lm(logy ~ x) ``` ] .pull-right[ <img src="22-multiplemodel2_BJ_files/figure-html/unnamed-chunk-15-1.png" width="100%" style="display: block; margin: auto;" /> ] --- class: middle, center # Questions? --- ## Let's Practice Together! Go to [AE 22: Models with Multiple Predictors 2 + Model Diagnostics](https://sta199-summer22.netlify.app/appex/ae22_BJ.html) --- ## Bulletin - Watch videos for [Prepare: June 14](https://sta199-summer22.netlify.app/prepare/week06_jun14_BJ.html) - Project draft due tonight at 11:59pm - HW02, HW04 due Thursday, June 16 at 11:59pm - Submit Part 1 and Part 2 of `ae22`