hw01.Rmd
to open the template R Markdown file.All plots should follow best visualization practices; plots should include:
viridis
, scico
, and many others.Place all plots in the center and properly adjust their size so that they are placed nicely in a written report.
Don’t forget to label your R chunk as well. Your label should be short, informative, shouldn’t include spaces, and shouldn’t repeat a previous label.
library(tidyverse)
theme_set(theme_bw())
library(sf)
The following data were originally distributed by Inside Airbnb (date compiled: 11 Dec, 2021) and has been modified for the purpose of this homework.
We will work on two datasets, namely, hawaii
and hawaii_nb
. The dataset hawaii
contains summary information and metrics for listings in Hawaii for the following variables:
id
: ID number of the listinghost_id
: ID number for the hostneighbourhood
: Neighbourhood the listing is located inneighbourhood_group
: Neighbourhood group (major islands) the listing is located inroom_type
: Room type of the listingprice
: Daily price in US dollarminimum_nights
: Minimum number of night stay for the listingnumber of reviews
: Number of reviews the listing has in the last 12 monthsbedrooms
: Number of bedroomsreview_scores_rating
: Average rating of the listingAn sf
object hawaii_nb
contains geometry information on neighborhoods of Hawaii.
Note: You do not have to worry about the last line in the code chunk below; it simply replaces column names with better understandable ones for future use.
<- read_csv("data/hawaii_airbnb.csv")
hawaii <- st_read("data/hawaii_airbnb_neighborhood.shp", quiet = TRUE)
hawaii_nb names(hawaii_nb)[1:2] <- c("neighbourhood", "neighbourhood_group")
hawaii
have? What does each row in the dataset represent? How many variables (columns) does hawaii
have? Identify review_scores_rating
and room_type
as numeric continuous, numeric discrete, categorical ordinal, or categorical nominal.🧶 ✅ ⬆️ Knit, commit, and push your final changes to GitHub with a meaningful commit message.
Reproduce the following tables. The column “count” represents the number of listings, and “percentage” is the associated percentage. Hint:
knitr::kable(., digits = 3)
at the end of your pipeline to neatly display tables in your final document with numbers rounded to three decimal places.ifelse()
for the column “minimum stay < 30?”.🧶 ✅ ⬆️ Knit, commit, and push your final changes to GitHub with a meaningful commit message.
Create a faceted histogram where each facet represents a neighborhood and displays the distribution of Airbnb prices in that neighborhood. In order to better understand the neighborhoods, fill histograms for neighborhoods in the same group (neighbourhood_group
) in the same color. How would you describe the distribution of price in general? How do neighborhoods compare to each other in terms of price?
We will examine median listing prices of neighborhoods.
hawaii_summary
with the minimum, mean, median, standard deviation, IQR, and maximum listing price in each neighborhood and identify neighborhoods with the top five median listing prices.sf
object hawaii_sf
that includes all rows and columns from hawaii_summary
and the associated geometry information from hawaii_nb
.scale_fill_gradient
. Describe what you observe.🧶 ✅ ⬆️ Knit, commit, and push your final changes to GitHub with a meaningful commit message.
annual_revenue
that estimates the annual revenue for the listing in the last 12 months. You may assume that visitors always left a review and stayed in the listing for minimum nights only. Hint: use minimum_nights
, price
and number_of_reviews
.host_id
.geom_smooth()
for a fitted line.🧶 ✅ ⬆️ Knit, commit, and push your final changes to GitHub with a meaningful commit message.
🧶 ✅ ⬆️ Knit, commit, and push your final changes to GitHub with a meaningful commit message.
Knit to PDF to create a PDF document. Stage and commit all remaining changes, and push your work to GitHub. Make sure all files are updated on your GitHub repo.
Only upload your PDF document to Gradescope. Before you submit the uploaded document, mark where each answer is to the exercises. If any answer spans multiple pages, then mark all pages. Associate the “Workflow & formatting” section with the first page.
Component | Points |
---|---|
Ex 1 | 5 |
Ex 2 | 8 |
Ex 3 | 8 |
Ex 4 | 15 |
Ex 5 | 8 |
Ex 6 | 6 |
Workflow & formatting | 10 |
Grading notes: