tidymodels
and stat
package to make inference under a linear regression modelQ - What do we need models for?
Q - What is a predicted value?
Q - What is a residual?
Q - What is a residual?
Q - What is a residual?
Q - What is a residual?
Q - What is a residual?
Q - What are some upsides and downsides of models?
Upsides:
Q - What are some upsides and downsides of models?
Upsides:
Downsides:
Q - Models always entail uncertainty. Which part of the following visualization and table is relevant to uncertainty?
## # A tibble: 2 × 3## term estimate std.error## <chr> <dbl> <dbl>## 1 (Intercept) 3.62 0.254 ## 2 Width_in 0.781 0.00950
Q - Models always entail uncertainty. Which part of the following visualization and table is relevant to uncertainty?
## # A tibble: 2 × 3## term estimate std.error## <chr> <dbl> <dbl>## 1 (Intercept) 3.62 0.254 ## 2 Width_in 0.781 0.00950
We are interested in β0 and β1 in the following model:
yi=β0+β1xi+ϵi
As usual, we have to estimate the true parameters with sample statistics:
^yi=^β0+^β1xi
In the least squares regression the estimates are calculated in a way to minimize the sum of squared residuals. In other words, if I have n observations and the ith residual is ei=yi−^yi, then the fitted regression line minimizes ∑ni=1e2i.
Q - Why do we minimize the "squares" of the residuals?
In the least squares regression the estimates are calculated in a way to minimize the sum of squared residuals. In other words, if I have n observations and the ith residual is ei=yi−^yi, then the fitted regression line minimizes ∑ni=1e2i.
Q - Why do we minimize the "squares" of the residuals?
In the least squares regression the estimates are calculated in a way to minimize the sum of squared residuals. In other words, if I have n observations and the ith residual is ei=yi−^yi, then the fitted regression line minimizes ∑ni=1e2i.
Q - Why do we minimize the "squares" of the residuals?
Click to play with least squares regression!
Q - What are some properties of the least squares regression?
¯y=^β0+^β1¯x
Q - Based on the code and output below, write a model formula with parameter estimates.
linear_reg() %>% set_engine("lm") %>% fit(Height_in ~ Width_in, data = pp) %>% tidy()
## # A tibble: 2 × 5## term estimate std.error statistic p.value## <chr> <dbl> <dbl> <dbl> <dbl>## 1 (Intercept) 3.62 0.254 14.3 8.82e-45## 2 Width_in 0.781 0.00950 82.1 0
Q - Based on the code and output below, write a model formula with parameter estimates.
linear_reg() %>% set_engine("lm") %>% fit(Height_in ~ Width_in, data = pp) %>% tidy()
## # A tibble: 2 × 5## term estimate std.error statistic p.value## <chr> <dbl> <dbl> <dbl> <dbl>## 1 (Intercept) 3.62 0.254 14.3 8.82e-45## 2 Width_in 0.781 0.00950 82.1 0
ˆheighti=3.62+0.781×widthi
Q - Interpret slope and intercept estimates in the context of data.
ˆheighti=3.62+0.781×widthi
Q - Interpret slope and intercept estimates in the context of data.
ˆheighti=3.62+0.781×widthi
Q - Explain the code chunk below. Based on its output, write a model formula with parameter estimates.
landsALL
is a categorical variable with the following two levels:
0
: no landscape features1
: some landscape featureslinear_reg() %>% set_engine("lm") %>% fit(Height_in ~ factor(landsALL), data = pp) %>% tidy()
## # A tibble: 2 × 5## term estimate std.error statistic p.value## <chr> <dbl> <dbl> <dbl> <dbl>## 1 (Intercept) 22.7 0.328 69.1 0 ## 2 factor(landsALL)1 -5.65 0.532 -10.6 7.97e-26
Q - Explain the code chunk below. Based on its output, write a model formula with parameter estimates.
landsALL
is a categorical variable with the following two levels:
0
: no landscape features1
: some landscape featureslinear_reg() %>% set_engine("lm") %>% fit(Height_in ~ factor(landsALL), data = pp) %>% tidy()
## # A tibble: 2 × 5## term estimate std.error statistic p.value## <chr> <dbl> <dbl> <dbl> <dbl>## 1 (Intercept) 22.7 0.328 69.1 0 ## 2 factor(landsALL)1 -5.65 0.532 -10.6 7.97e-26
ˆheighti=22.7−5.65×landsALL
Q - Interpret slope and intercept estimates in the context of data.
ˆheighti=22.7−5.65×landsALL
Q - Interpret slope and intercept estimates in the context of data.
ˆheighti=22.7−5.65×landsALL
Slope: Paintings with landscape features are expected, on average, to be 5.65 inches shorter than paintings that without landscape features.
landsALL = 0
) to the other level (landsALL = 1
)Q - What happens in model fitting if a categorical variable has more than two levels?
Q - What happens in model fitting if a categorical variable has more than two levels?
Q - Explain the code chunk below.
school_pntg
is a categorical variable about school of paintings with 7 levels:
A
: Austrian, D\FL
: Dutch/Flemish, F
: French, G
: German, I
: Italian, S
: Spanish, X
: Unknown
linear_reg() %>% set_engine("lm") %>% fit(Height_in ~ school_pntg, data = pp) %>% tidy()
## # A tibble: 7 × 5## term estimate std.error statistic p.value## <chr> <dbl> <dbl> <dbl> <dbl>## 1 (Intercept) 14.0 10.0 1.40 0.162 ## 2 school_pntgD/FL 2.33 10.0 0.232 0.816 ## 3 school_pntgF 10.2 10.0 1.02 0.309 ## 4 school_pntgG 1.65 11.9 0.139 0.889 ## 5 school_pntgI 10.3 10.0 1.02 0.306 ## 6 school_pntgS 30.4 11.4 2.68 0.00744## # … with 1 more row
Q - Interpret slope and intercept estimates in the context of data.
school_pntg
is a categorical variable about school of paintings with 7 levels:
A
: Austrian, D\FL
: Dutch/Flemish, F
: French, G
: German, I
: Italian, S
: Spanish, X
: Unknown
## # A tibble: 7 × 5## term estimate std.error statistic p.value## <chr> <dbl> <dbl> <dbl> <dbl>## 1 (Intercept) 14.0 10.0 1.40 0.162 ## 2 school_pntgD/FL 2.33 10.0 0.232 0.816 ## 3 school_pntgF 10.2 10.0 1.02 0.309 ## 4 school_pntgG 1.65 11.9 0.139 0.889 ## 5 school_pntgI 10.3 10.0 1.02 0.306 ## 6 school_pntgS 30.4 11.4 2.68 0.00744## # … with 1 more row
Q - Interpret slope and intercept estimates in the context of data.
school_pntg
is a categorical variable about school of paintings with 7 levels:
A
: Austrian, D\FL
: Dutch/Flemish, F
: French, G
: German, I
: Italian, S
: Spanish, X
: Unknown
## # A tibble: 7 × 5## term estimate std.error statistic p.value## <chr> <dbl> <dbl> <dbl> <dbl>## 1 (Intercept) 14.0 10.0 1.40 0.162 ## 2 school_pntgD/FL 2.33 10.0 0.232 0.816 ## 3 school_pntgF 10.2 10.0 1.02 0.309 ## 4 school_pntgG 1.65 11.9 0.139 0.889 ## 5 school_pntgI 10.3 10.0 1.02 0.306 ## 6 school_pntgS 30.4 11.4 2.68 0.00744## # … with 1 more row
Watch videos for Prepare: June 10
Lab07 due Friday, June 10 at 11:59pm
HW04 released
Don't forget HW02! It's due Thursday, June 16 at 11:59pm.
Project draft due Monday, June 13 at 11:59pm
Submit ae20
(~ Part 2 Question 2)
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |