Lab #03: Spatial Data Visualization + Wrangling

due Monday, May 23 at 11:59pm

Goals

Getting started

All plots should follow best visualization practices; plots should include:

Don’t forget to label your R chunk as well. Your label should be short, informative, shouldn’t include spaces, and shouldn’t repeat a previous label.

Packages

We need the tidyverse and dsbox for this lab. If you don’t have dsbox already installed, please run commented lines only once.

library(tidyverse)
theme_set(theme_bw())
# install.packages("devtools")
# devtools::install_github("rstudio-education/dsbox")
library(dsbox) # for data 
library(sf)

North Carolina Bycycle Crash Data

The data set ncbikecrash is available in the dsbox package and contains all NC bike crash data from 2007 - 2014. Check the documentation of the data by typing ?ncbikecrash in your console.

  1. Create a line plot displaying temporal changes of the number of bike accidents where the drivers age is between 0 and 19, inclusive, with points at each year. Make each year have a tick and a label on the x-axis. Hint: you may use geom_line(), geom_point(), and scale_x_continuous(breaks = 2007:2014). Please describe what you observe.

🧶 ✅ ⬆️ Knit, commit, and push your final changes to GitHub with a meaningful commit message.

  1. We examine if there is any relationship between crash_hour and hit_run.
    • Create bar plots of crash_hour at each level of hit_run using faceting. What do you learn about the relationship from this plot?
    • Create a segmented bar plot displaying proportions of hit and run at different hours. What do you learn about the relationship from this plot?
    • Which plot do you think is more effective? Why?

🧶 ✅ ⬆️ Knit, commit, and push your final changes to GitHub with a meaningful commit message.

  1. Recreate the following plot and describe in context of the data. The aesthetic factor alpha is set at 0.85. Colors used are #A04543, #FFFFCC, and #EBA553. Note for this exercise you should begin with the code below. The ifelse([condition], [value1], [value2]) function assigns [value1] if [condition] holds and [value2] otherwise.
ncbikecrash <- ncbikecrash %>% 
  mutate(week = ifelse(crash_day %in% c("Saturday", "Sunday"), 
                       "Weekend", "Weekday"))

🧶 ✅ ⬆️ Knit, commit, and push your final changes to GitHub with a meaningful commit message.

Now we will play with spatial data. Use the code below to load the NC shapefile downloaded from NC OneMap.

nc_counties <- st_read("data/nc_counties.shp", quiet = TRUE) 
  1. Join ncbikecrash and nc_counties.
    • What variable can be an identifier joining the two data sets? Please leave only the identifier of your choosing and geometry from nc_counties.
    • Create a data set called ncbikecrash_sf that contains all accidents in ncbikecrash and associated geometry features. The resulting data set should be a simple feature, too.
  2. Create an NC map with counties colored according to the total number of accidents between 2007 and 2014. You may use scale_fill_gradient() to fill counties with a color gradient. Use informative colors for low and high number of accidents.
    • What are the counties with the high number (\(>\) 750) of accidents? Answer it using ncbikecrash.
    • Can you conclude that bikers should avoid those counties based on the above map? Explain your answer. If not, which map would you consider instead to examine which counties are dangerous for bikers?
  3. Identify the county that is not drawn on the map above using an appropriate join function for nc_counties and ncbikecrash.

🧶 ✅ ⬆️ Knit, commit, and push your final changes to GitHub with a meaningful commit message.

Submission

Once you are fully satisfied with your lab, Knit to .pdf to create a PDF document.

Follow the instructions in previous labs to submit your PDF to Gradescope.

Be sure to identify which problems are on each page using Gradescope.

Once you are finished with the lab, you will submit the PDF document produced from your final knit, commit, and push to Gradescope.

Before you wrap up the assignment, make sure all documents are updated on your GitHub repo. We will be checking these to make sure you have been practicing how to commit and push changes. Remember – you must turn in a .pdf file to the Gradescope page by the submission deadline to be considered “on time”.

To submit your assignment:

Grading (50 pts)


Component Points
Ex 1 7
Ex 2 9
Ex 3 10
Ex 4 6
Ex 5 7
Ex 6 4
Workflow & formatting 7

Grading notes: