+ - 0:00:00
Notes for current slide
Notes for next slide

Models with Multiple Predictors 2 + Model Diagnostics

Bora Jin

1 / 24

Today's Goal

  • Use functions in R to fit a linear model with multiple predictors
  • Model interactions between variables
  • Understand what's linear in linear regressions
  • Understand and implement CI and HT for regression parameters
  • Understand model diagnostics and how to handle common model violations
3 / 24

Quiz

Suppose a dataset called mydata has variables y, x1, and x2. The variable x1 is numeric and x2 is categorical. For the following questions, write out a regression model and the code to fit the model.

Q - Same slope and same intercept between x1 and y for different levels of x2.

4 / 24

Quiz

Suppose a dataset called mydata has variables y, x1, and x2. The variable x1 is numeric and x2 is categorical. For the following questions, write out a regression model and the code to fit the model.

Q - Same slope and same intercept between x1 and y for different levels of x2. y=β0+β1 x1+ϵ

4 / 24

Quiz

Suppose a dataset called mydata has variables y, x1, and x2. The variable x1 is numeric and x2 is categorical. For the following questions, write out a regression model and the code to fit the model.

Q - Same slope and same intercept between x1 and y for different levels of x2. y=β0+β1 x1+ϵ

lm(y ~ x1, data = mydata)
4 / 24

Quiz

Q - Same slope and different intercept (parallel lines) for different levels of x2.

5 / 24

Quiz

Q - Same slope and different intercept (parallel lines) for different levels of x2.

y=β0+β1 x1+β2 x2+ϵ

5 / 24

Quiz

Q - Same slope and different intercept (parallel lines) for different levels of x2.

y=β0+β1 x1+β2 x2+ϵ

linear_reg(engine = "lm") %>%
fit(y ~ x1 + x2, data = mydata)
5 / 24

Quiz

Q - Different slope and different intercept (non-parallel lines) for different levels of x2.

6 / 24

Quiz

Q - Different slope and different intercept (non-parallel lines) for different levels of x2.

y=β0+β1 x1+β2 x2+β3 (x1x2)+ϵ

6 / 24

Quiz

Q - Different slope and different intercept (non-parallel lines) for different levels of x2.

y=β0+β1 x1+β2 x2+β3 (x1x2)+ϵ

lm(y ~ x1*x2, data = mydata)
lm(y ~ x1 + x2 + x1:x2, data = mydata)
6 / 24

Quiz

Q - Write separate fitted models for non-living artists (artistliving = 0) and for living artists (artistliving = 1) using the following result. Your fitted models should include log_price and surface only.

log_price^=4.91+0.00021 surface0.126 artistliving + 0.00048 surfaceartistliving

7 / 24

Quiz

Q - Write separate fitted models for non-living artists (artistliving = 0) and for living artists (artistliving = 1) using the following result. Your fitted models should include log_price and surface only.

log_price^=4.91+0.00021 surface0.126 artistliving + 0.00048 surfaceartistliving

  • Non-living artists: log_price^=4.91+0.00021 surface
  • Living artists: log_price^=4.784+0.00069 surface
  • Non-parallel lines due to the interaction effect!
7 / 24

Model Diagnostics

Source: Duke STA210 by Prof. Mine Çetinkaya-Rundel https://sta210-s22.github.io/website/slides/lec-7.html#

8 / 24

Model Conditions

  • Linearity: There is a linear relationship between the response and predictor variables.
  • Independence: The errors are independent from each other.
  • Normality (optional): The errors follow a normal distribution.
  • Equal variance: The variability of the errors is equal for all values of the predictor variable.
  • For multiple regression, the predictors should not be too correlated with each other.
9 / 24

Linearity and Equal Variance

  • Linearity: The residuals vs. fitted values plot should show a random scatter of residuals around 0.

    • No distinguishable pattern or structure along the x or y axes.
    • Why do we want a complete random scatter?

      • It means that my model is good and captures any interesting (linear) relationship in the data.
      • Remaining patterns in residuals vs. fitted values suggest that the linear model is not the best assumption for the data.
  • Equal variance: The vertical spread of the residuals should be relatively constant across the plot.

10 / 24

Linearity and Equal Variance

This is what we look for

11 / 24

Linearity and Equal Variance

We don't want
increasing / decreasing variability in residuals as predicted value increases

12 / 24

Linearity and Equal Variance

We don't want
any groups of residuals

13 / 24

Linearity and Equal Variance

We don't want
residuals correlated with predicted values

14 / 24

Linearity and Equal Variance

We don't want
any patterns 1

15 / 24

Linearity and Equal Variance

We don't want
any patterns 2

16 / 24

Normality

17 / 24

Independence

  • We can often check the independence assumption based on the context of the data and how the observations were collected.

  • If the data were collected in a particular order, examine a scatterplot of the residuals versus order in which the data were collected.

18 / 24

When Model Conditions Are Violated

Linearity and equal variance seem violated.

19 / 24

When Model Conditions Are Violated

Transform the response variable. This may help!

  • Natural log transformation on y variable: In R, log(y)
  • Helpful for extremely right skewed distribution and/or non-constant variance in residuals
20 / 24

Log Transformation

This is still a linear model with log(y) as the response: log(y)=β0+β1 x+ϵ  log(y)^=β^0+β^1 x

logy <- log(y)
lm2 <- lm(logy ~ x)

21 / 24

Questions?

22 / 24

Bulletin

  • Watch videos for Prepare: June 14

  • Project draft due tonight at 11:59pm

  • HW02, HW04 due Thursday, June 16 at 11:59pm

  • Submit Part 1 and Part 2 of ae22

24 / 24
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow