class: center, middle, inverse, title-slide # Data Visualization 1 ### Bora Jin --- layout: true <div class="my-footer"> <span> <a href="https://introds.org" target="_blank">introds.org</a> </span> </div> --- ## Material 🎥 Watch [Data and Visualization](https://www.youtube.com/watch?v=FddF4b_GuTI) - [Slides](https://rstudio-education.github.io/datascience-box/course-materials/slides/u2-d01-data-viz/u2-d01-data-viz.html#1) 🎥 Watch [Visualizing Data with ggplot2](https://www.youtube.com/watch?v=s2NF2J36ljE) - [Slides](https://rstudio-education.github.io/datascience-box/course-materials/slides/u2-d02-ggplot2/u2-d02-ggplot2.html#1) --- ## Today's Goal - Build up an effective visualization systematically layer by layer with `ggplot2` "The simple graph has brought more information to the data analyst's mind than any other device" - John Tukey --- ## Quiz **Q - What does each row of a dataset represent?** -- Observation -- **Q - What does each column of a dataset represent?** -- Variable -- **Q - All variables in a dataset must be in the same type (character, integer, etc.). (T/F)** -- F -- **Q - In this class, we mainly focus on variables having a single entry per observation. (T/F)** -- T --- ## Quiz **Q - Which function spits out the dimension of a dataset?** -- `dim()` (row, column) -- **Q - How to check the number of rows of a dataset?** -- `nrow()` -- **Q - How to check the number of columns of a dataset?** -- `ncol()` --- ## Quiz **Q - What is the information that you cannot get using `glimpse()`?** a. number of rows b. number of cols c. list of variables d. variable types e. summaries of variables --- ## Quiz **Q - What is the information that you cannot get using `glimpse()`?** a. number of rows b. number of cols c. list of variables d. variable types **e. summaries of variables** --- ## Quiz **Q - What is EDA?** -- - Approach to analysing datasets to summarize and describe its main characteristics - Often visual -- **Q - Why is visualization important?** -- It can reveal a new story that summary statistics could miss! -- **Q - Before plotting, what package do we load?** -- `library(tidyverse)` (`ggplot2` `\(\in\)` `tidyverse`) --- ## Quiz **Q - When plotting, what are the three fundamental things to define and the relevant code?** -- - Data in `ggplot(data = )` - Aesthetics in `ggplot(mapping = aes())` - Geometries in `geom_xxxx()` layers added by `+` -- ```r ggplot(data = [dataset], mapping = aes(x = [x-variable], y = [y-variable])) + geom_xxx() + other options ``` --- ## Quiz **Q - What are *other options* include?** -- - Labels and legends in `labs()` - More aesthetics options (color, shape, size, alpha) - Faceting - Many more! -- **Q - For aesthetics options, can you explain mapping vs. setting?** -- - Mapping: map plotting characters to a specific variable in data through `aes()` - Setting: determine them without regard to data values through `geom_xxxx()` --- ## Quiz **Q - Why faceting?** -- - To explore conditional relationships - Multiple smaller plots in one plot -- **Q - Two functions for faceting?** -- - `facet_grid([row-variable] ~ [column-variable])` - `facet_wrap(~ [variable])` --- class: middle, center # Questions? --- ## Let's Practice Together! - [AE 02: Part 1, Tour of RStudio + GitHub](https://sta199-summer22.netlify.app/appex/ae02_BJ.html) - [AE 03: Data Visualization 1](https://sta199-summer22.netlify.app/appex/ae03_BJ.html) --- ## Bulletin - Watch videos for [Prepare: May 16](https://sta199-summer22.netlify.app/prepare/week02_may16_BJ.html) - Submit your `ae02` - Complete Part 1-3 in `ae03`