hw04.Rmd
to open the template R Markdown file.For each exercise:
Make sure we see all relevant code and output in the knitted PDF. If you use inline code, make sure we can still see the code used to derive that answer.
Write all narrative in complete sentences with decimal numbers rounded to three decimal places.
Write your code and narrative in a reproducible way, so you can easily change the number of reps. For example, consider ways you can write your narrative using inline code, so the relevant values update when you change the number of reps.
All plots should follow best visualization practices; plots should include:
viridis
, scico
, and many others.Place all plots in the center and properly adjust their size so that they are placed nicely in a written report.
Don’t forget to label your R chunk as well. Your label should be short, informative, shouldn’t include spaces, and shouldn’t repeat a previous label.
The following resource provides code needed to make useful symbols. You may use the code to typeset the characters of interest in the narrative of your document:
$\hat{y}$
$\widehat{long-y}$
$\beta_{name}$
$text\_with\_underbar$
We’ll use the tidyverse
package for much of the data wrangling and visualization, the tidymodels
package for inference, and the data live in the openintro
package. We use the knitr
package for nice-looking tables.
library(tidyverse)
theme_set(theme_bw())
library(tidymodels)
library(openintro)
library(knitr)
Many college courses conclude by giving students the opportunity to evaluate the course and the instructor anonymously. However, the use of these student evaluations as an indicator of course quality and teaching effectiveness is often criticized because these measures may reflect the influence of non-teaching related characteristics, such as the physical appearance of the instructor. The article titled, “Beauty in the classroom: instructors’ pulchritude and putative pedagogical productivity” (Hamermesh and Parker, 2005) found that instructors who are viewed to be better looking receive higher instructional ratings. (Daniel S. Hamermesh, Amy Parker, Beauty in the classroom: instructors pulchritude and putative pedagogical productivity, Economics of Education Review, Volume 24, Issue 4, August 2005, Pages 369-376, ISSN 0272-7757, 10.1016/j.econedurev.2004.07.013. http://www.sciencedirect.com/science/article/pii/S0272775704001165.)
In this homework you will analyze the data from this study in order to learn what goes into a positive professor evaluation.
The data were gathered from end of semester student evaluations for a large sample of professors from the University of Texas at Austin. In addition, six students rated the professors’ physical appearance. (This is a slightly modified version of the original dataset that was released as part of the replication data for Data Analysis Using Regression and Multilevel / Hierarchical Models (Gelman and Hill, 2007).) The result is a data frame where each row contains a different course and columns represent variables about the courses and professors.
The data can be found in the openintro
package, and it’s called evals
. Since the dataset is distributed with the package, we don’t need to load it separately; it becomes available to us when we load the package. You can find out more about the dataset by inspecting its documentation, which you can access by running ?evals
in the Console.
Visualize the distribution of score
. Is the distribution skewed? What does that tell you about how students rate courses? Is this what you expected to see? Why, or why not? Include any summary statistics and visualizations you use in your response.
Visualize and describe the relationship between score
and bty_avg
in a scatterplot.
🧶 ✅ ⬆️ Knit, commit, and push your final changes to GitHub with a meaningful commit message.
Let’s see if the trend in the plot in Ex 2 is something more than natural variation. Fit a linear model called score_bty_fit
to predict average professor evaluation score
by average beauty rating (bty_avg
). Print the regression output as a nice kable
table with all numbers rounded up to three decimal places. Based on the regression output, write the linear model.
Recreate the scatterplot from Ex 2, and add the regression line to this plot in orange color, with shading for the uncertainty of the line turned off. Hint: use the geom_smooth()
function.
Interpret the slope and the intercept of the linear model in context of the data.
🧶 ✅ ⬆️ Knit, commit, and push your final changes to GitHub with a meaningful commit message.
Fit a new linear model called score_gender_fit
to predict average professor evaluation score
based on gender
of the professor. Based on the regression output, write the linear model using an indicator variable and interpret the slope and intercept in context of the data.
Determine the \(R^2\) of both models and interpret the \(R^2\) statistics in context of the data. Which model do you prefer and why?
🧶 ✅ ⬆️ Knit, commit, and push your final changes to GitHub with a meaningful commit message.
Fit a linear model: score_bty_gen_fit
, predicting average professor evaluation score
based on average beauty rating (bty_avg
) and gender
. Write the linear model with an indicator variable for gender
and interpret the slopes and intercept in context of the data.
What is the equation of the line between score
and bty_avg
corresponding to male professors? What is it for female professors?
For two professors who received the same beauty rating, which gender tends to have higher course evaluation score? Why?
How do the \(R^2\) and the adjusted \(R^2\) values of score_bty_gen_fit
and score_bty_fit
compare? Which metric do you believe more in this case? Why?
What does this tell us about how useful gender
is in explaining the variability in evaluation scores when we already have information on the beauty score of the professor?
Fit a linear model: score_bty_gen_int_fit
, predicting average professor evaluation score
based on average beauty rating (bty_avg
), gender
, and their interaction. Print the regression output as a nice kable
table with all numbers rounded up to three decimal places. Based on the regression output, write the linear model with an indicator variable for gender
.
What is the equation of the line between score
and bty_avg
corresponding to male professors from score_bty_gen_int_fit
? What is it for female professors? Which gender professors benefit more on the course evaluation by a one point increase in the average beauty rating? Why?
Compare the adjusted \(R^2\) values of score_bty_gen_fit
and score_bty_gen_int_fit
compare? Which model do you prefer and why?
What does this tell us about how useful the interaction term between bty_avg
and gender
is in explaining the variability in evaluation scores when we already have information on main effects of the beauty score of a professor and gender?
We will verify if the coefficient parameter for the interaction term (\(\beta_3\)) is significantly different from zero via the Central Limit Theorem based hypothesis testing.
R
produces in the regression output.🧶 ✅ ⬆️ Knit, commit, and push your final changes to GitHub with a meaningful commit message. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards. Go to your homework repo on GitHub and see if your commit is well reflected.
Knit to PDF to create a PDF document. Stage and commit all remaining changes, and push your work to GitHub. Make sure all files are updated on your GitHub repo.
Only upload your PDF document to Gradescope. Before you submit the uploaded document, mark where each answer is to the exercises. If any answer spans multiple pages, then mark all pages. Associate the “Workflow & formatting” section with the first page.
Component | Points |
---|---|
Ex 1 | 4 |
Ex 2 | 2 |
Ex 3 | 3 |
Ex 4 | 2 |
Ex 5 | 3 |
Ex 6 | 6 |
Ex 7 | 5 |
Ex 8 | 7 |
Ex 9 | 2 |
Ex 10 | 2 |
Ex 11 | 4 |
Ex 12 | 1 |
Ex 13 | 3 |
Ex 14 | 4 |
Ex 15 | 2 |
Ex 16 | 1 |
Ex 17 | 7 |
Workflow & formatting | 2 |
Grading notes: